在 Athena/Presto 中将 JSON 转换为 ARRAY<MAP>

Posted

技术标签:

【中文标题】在 Athena/Presto 中将 JSON 转换为 ARRAY<MAP>【英文标题】:Transform JSON to to ARRAY<MAP> in Athena/Presto 【发布时间】:2022-01-18 11:55:41 【问题描述】:

我在 Athena 中有一个可用的表,其中有一列 JSON 结构如下:


    "455a9410-29a8-48a3-ad22-345afa3cd295":
    
        "legacy_id": 1599677886,
        "w_ids":
        [
            "845254682",
            "831189092"
        ]
    ,
    "5e74c911-0b63-4b84-8ad4-77dd9bed7b53":
    
        "legacy_id": 1599707069,
        "w_ids":
        [
            "1032024432"
        ]
    ,
    "7b988890-20ff-4279-94df-198369a58848":
    
        "legacy_id": 1601097861,
        "w_ids":
        [
            "1032024432"
        ]
    

我想将其转换为以下格式的 ARRAY:

[
    "new_id"="455a9410-29a8-48a3-ad22-345afa3cd295","legacy_id"=1599677886,"w_ids"=["845254682","831189092"],
    "new_id"="5e74c911-0b63-4b84-8ad4-77dd9bed7b53","legacy_id"=1599707069,"w_ids"=["1032024432"],
    "new_id"="7b988890-20ff-4279-94df-198369a58848","legacy_id"=1601097861,"w_ids"=["1032024432"]
]

我已经能够使用以下语句提取legacy_idw_ids,但我很难将原始键添加为值:

 with example_data as
 (
     select * from (
        VALUES('    "455a9410-29a8-48a3-ad22-345afa3cd295":            "legacy_id": 1599677886,        "w_ids":        [            "845254682",            "831189092"        ]    ,    "5e74c911-0b63-4b84-8ad4-77dd9bed7b53":            "legacy_id": 1599707069,        "w_ids":        [            "1032024432"        ]    ,    "7b988890-20ff-4279-94df-198369a58848":            "legacy_id": 1601097861,        "w_ids":        [            "1032024432"        ]    ')
     ) as t(col)
 )
select *
,transform(map_values(cast(json_parse(col) AS map(varchar, json))),entry -> MAP_FROM_ENTRIES(ARRAY[('legacy_id',json_extract(entry,'$.legacy_id')),('w_ids',json_extract(entry,'$.w_ids'))]))
from example_data;

【问题讨论】:

【参考方案1】:

一种方法可以使用map_values 而不是transform_values 而不是transform 而不是map_values

select map_values(
        transform_values(
            cast(json_parse(col) AS map(varchar, json)),
            (key, entry)->MAP_FROM_ENTRIES(
                ARRAY [('new_id', cast(key as json)),
                ('legacy_id', json_extract(entry, '$.legacy_id')),
                ('w_ids', json_extract(entry, '$.w_ids')) ]
            )
        )
    )
from example_data;

输出:

_col0
[new_id='455a9410-29a8-48a3-ad22-345afa3cd295', legacy_id=1599677886, w_ids=['845254682','831189092'], new_id='5e74c911-0b63-4b84-8ad4-77dd9bed7b53', legacy_id=1599707069, w_ids=['1032024432'], new_id='7b988890-20ff-4279-94df-198369a58848', legacy_id=1601097861, w_ids=['1032024432']]

【讨论】:

像魅力一样工作 - 谢谢。

以上是关于在 Athena/Presto 中将 JSON 转换为 ARRAY<MAP>的主要内容,如果未能解决你的问题,请参考以下文章

在 Athena/Presto 中将数组拆分为列

Athena/Presto:复杂结构/数组

Athena (Presto) SQL 窗口函数

AWS Athena (Presto) 偏移支持

我可以在写之前使用 Athena / Presto 对表格进行排序吗?

Athena/Presto Escape 下划线