在 Hive 中,如何组合多个表以生成包含对象数组的单行?
Posted
技术标签:
【中文标题】在 Hive 中,如何组合多个表以生成包含对象数组的单行?【英文标题】:In Hive, how to combine multiple tables to produce single row containing array of objects? 【发布时间】:2019-05-11 07:54:41 【问题描述】:我有两张表如下:
users table
==========================
| user_id name age |
|=========================
| 1 pete 20 |
| 2 sam 21 |
| 3 nash 22 |
==========================
hobbies table
======================================
| user_id hobby time_spent |
|=====================================
| 1 football 2 |
| 1 running 1 |
| 1 basketball 3 |
======================================
第一个问题:我想做一个可以以这种格式返回行的 Hive 查询:
"user_id":1, "name":"pete", "hobbies":[ hobby: "football", "time_spent": 2, "hobby": "running", "time_spent": 1, "hobby": "basketball", "time_spent": 3 ]
第二个问题:如果爱好表如下:
========================================
| user_id hobby scores |
|=======================================
| 1 football 2,3,1 |
| 1 running 1,1,2,5 |
| 1 basketball 3,6,7 |
========================================
是否有可能获得如下所示的输出中 score 是列表的行输出:
"user_id":1, "name":"pete", "hobbies":[ hobby: "football", "scores": [2, 3, 1], "hobby": "running", "scores": [1, 1, 2, 5], "hobby": "basketball", "scores": [3, 6, 7] ]
【问题讨论】:
【参考方案1】:我找到了第一个问题的答案
select u.user_id, u.name,
collect_list(
str_to_map(
concat_ws(",", array(
concat("hobby:", h.hobby),
concat("time_spent:", h.time_spent)
))
)
) as hobbies
from users as u
join hobbies as h on u.user_id=h.user_id
group by u.user_id, u.name;
【讨论】:
以上是关于在 Hive 中,如何组合多个表以生成包含对象数组的单行?的主要内容,如果未能解决你的问题,请参考以下文章
在 Hive 中,这种模式如何从 json 数组中识别嵌套的 json?