如何爆炸结构数组?
Posted
技术标签:
【中文标题】如何爆炸结构数组?【英文标题】:How to explode structs array? 【发布时间】:2018-12-20 14:40:38 【问题描述】:我正在使用 JSON 对象,并希望基于 Spark SQL 数据帧/数据集将 object.hours 转换为关系表。
我尝试使用“explode”,它并不真正支持“structs array”。
json 对象如下:
"business_id": "abc",
"full_address": "random_address",
"hours":
"Monday":
"close": "02:00",
"open": "11:00"
,
"Tuesday":
"close": "02:00",
"open": "11:00"
,
"Friday":
"close": "02:00",
"open": "11:00"
,
"Wednesday":
"close": "02:00",
"open": "11:00"
,
"Thursday":
"close": "02:00",
"open": "11:00"
,
"Sunday":
"close": "00:00",
"open": "11:00"
,
"Saturday":
"close": "02:00",
"open": "11:00"
到如下关系表,
CREATE TABLE "business_hours" (
"id" integer NOT NULL PRIMARY KEY,
"business_id" integer NOT NULL FOREIGN KEY REFERENCES "businesses",
"day" integer NOT NULL,
"open_time" time,
"close_time" time
)
【问题讨论】:
【参考方案1】:你可以用这个技巧做到这一点:
import org.apache.spark.sql.types.StructType
val days = df.schema
.fields
.filter(_.name=="hours")
.head
.dataType
.asInstanceOf[StructType]
.fieldNames
val solution = df
.select(
$"business_id",
$"full_address",
explode(
array(
days.map(d => struct(
lit(d).as("day"),
col(s"hours.$d.open").as("open_time"),
col(s"hours.$d.close").as("close_time")
)):_*
)
)
)
.select($"business_id",$"full_address",$"col.*")
scala> solution.show
+-----------+--------------+---------+---------+----------+
|business_id| full_address| day|open_time|close_time|
+-----------+--------------+---------+---------+----------+
| abc|random_address| Friday| 11:00| 02:00|
| abc|random_address| Monday| 11:00| 02:00|
| abc|random_address| Saturday| 11:00| 02:00|
| abc|random_address| Sunday| 11:00| 00:00|
| abc|random_address| Thursday| 11:00| 02:00|
| abc|random_address| Tuesday| 11:00| 02:00|
| abc|random_address|Wednesday| 11:00| 02:00|
+-----------+--------------+---------+---------+----------+
【讨论】:
以上是关于如何爆炸结构数组?的主要内容,如果未能解决你的问题,请参考以下文章