我们可以展平 Hive 表中包含 Json 作为值的列吗?

Posted

技术标签:

【中文标题】我们可以展平 Hive 表中包含 Json 作为值的列吗?【英文标题】:Can we flatten column which contain Json as value in Hive table? 【发布时间】:2021-06-16 09:25:01 【问题描述】:

我有一个带有 Json 值的配置单元列“事件”。我怎样才能展平这个 Json 以创建一个配置单元表,其中列作为 Json 的关键字段。它甚至可能吗? ex- 我需要 hive 表列是 events、start_date、id、details 和相应的值。

|活动 |

|["start_date":20201230,"id":"3245ret","details":"Imp","start_date":20201228,"id":"3245rtr","details":"NoImp "] |

|["start_date":20191230,"id":"3245ret","details":"vImp","start_date":20191228,"id":"3245rwer","details":"NoImp "]|

【问题讨论】:

您想为事件列保留什么值 对于事件列,值将与它的值相同,但 start_date、id 和详细信息列的值应与 Json 中的值相同 在一行中有两个 json 对象,即两个 json 对象的数组。我说的对吗? 是的,还可以更多 非常感谢@leftjoin,它解决了问题 【参考方案1】:

演示:

select events, 
get_json_object(element,'$.id') as id,
get_json_object(element,'$.start_date') as start_date,
get_json_object(element,'$.details') as details
from
(
select '["start_date":20201230,"id":"3245ret","details":"Imp","start_date":20201228,"id":"3245rtr","details":"NoImp"]' as events
union all 
select '["start_date":20191230,"id":"3245ret","details":"vImp","start_date":20191228,"id":"3245rwer","details":"NoImp"]' as events
) s lateral view outer explode (split(regexp_replace(events, '\\[|\\]',''),'(?<=\\),(?=\\)')) e as element

初始字符串由大括号之间的逗号分隔,(see explanation here)、横向视图分解的数组和使用 get_json_object 解析的 JSON 对象

结果:

 events                                                                                                             id       start_date details
["start_date":20201230,"id":"3245ret","details":"Imp","start_date":20201228,"id":"3245rtr","details":"NoImp"]   3245ret  20201230  Imp
["start_date":20201230,"id":"3245ret","details":"Imp","start_date":20201228,"id":"3245rtr","details":"NoImp"]   3245rtr  20201228  NoImp
["start_date":20191230,"id":"3245ret","details":"vImp","start_date":20191228,"id":"3245rwer","details":"NoImp"] 3245ret  20191230  vImp
["start_date":20191230,"id":"3245ret","details":"vImp","start_date":20191228,"id":"3245rwer","details":"NoImp"] 3245rwer 20191228  NoImp

【讨论】:

以上是关于我们可以展平 Hive 表中包含 Json 作为值的列吗?的主要内容,如果未能解决你的问题,请参考以下文章

使用 jq 展平嵌套的 JSON

从值中包含逗号的 JSON 中提取键值对

如何在展平嵌套字段后将数据​​从一个 bigquery 表流式插入到另一个表?

生成嵌套 JSON(反向横向展平)

我可以在 Objective-C 的 init 中包含 NSError** 作为参数吗?

Sequelize:在 findAll 中包含连接表属性包括