如何使用Hive解析多个嵌套的JSON数组

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何使用Hive解析多个嵌套的JSON数组相关的知识,希望对你有一定的参考价值。


    "base": 
        "code": "xm",
        "name": "project"
    ,
    "list": [
        "ACode": "cp1",
        "AName": "Product1",
        "BList": [
            "BCode": "gn1",
            "BName": "Feature1"
        , 
            "BCode": "gn2",
            "BName": "Feature2"
        ]
    , 
        "ACode": "cp2",
        "AName": "Product2",
        "BList": [
            "BCode": "gn1",
            "BName": "Feature1"
        ]
    ]

像这样的JSON,想要得到这个

| code | name    | ACode | Aname    | Bcode | Bname    |
| ---- | ------- | ----- | -------- | ----- | -------- |
| xm   | project | cp1   | Product1 | gn1   | Feature1 |
| xm   | project | cp1   | Product1 | gn2   | Feature2 |
| xm   | project | cp2   | Product2 | gn1   | Feature1 |

我尝试使用此

SELECT
    code
  , name
  , get_json_object(t.list, '$.[*].ACode')          AS ACode
  , get_json_object(t.list, '$.[*].AName')          AS AName
  , get_json_object(t.list, '$.[*].BList[*].BCode') AS BCode
  , get_json_object(t.list, '$.[*].BList[*].BName') AS BName
FROM
    (
        SELECT
            get_json_object(t.value, '$.base.code') AS code
          , get_json_object(t.value, '$.base.name') AS name
          , get_json_object(t.value, '$.list')      AS list
        FROM
            (
                SELECT
                    '"base":"code":"xm","name":"project","list":["ACode":"cp1","AName":"Product1","BList":["BCode":"gn1","BName":"Feature1","BCode":"gn2","BName":"Feature2"],"ACode":"cp2","AName":"Product2","BList":["BCode":"gn1","BName":"Feature1"]]' as value
            )
            t
    )
    t
;

获取此

xm  project ["cp1","cp2"]   ["Product1","Product2"] ["gn1","gn2","gn1"] ["Feature1","Feature2","Feature1"]

但是我发现它将生成六行。似乎具有笛卡尔积。而且我尝试使用split(string,“ \,\ ”),但这将同时拆分内部层。因此,我希望获得帮助。

答案

我解决了!

SELECT
    code
  , name
  , ai.ACode
  , ai.AName
  , p.BCode
  , p.BName
FROM
    (
        SELECT
            get_json_object(t.value, '$.base.code') AS code
          , get_json_object(t.value, '$.base.name') AS name
          , get_json_object(t.value, '$.list')      AS list
        FROM
            (
                SELECT
                    '"base":"code":"xm","name":"project","list":["ACode":"cp1","AName":"Product1","BList":["BCode":"gn1","BName":"Feature1","BCode":"gn2","BName":"Feature2"],"ACode":"cp2","AName":"Product2","BList":["BCode":"gn1","BName":"Feature1"]]' as value
            )
            t 
    )
    t 
    lateral view explode(split(regexp_replace(regexp_extract(list,'^\\[(.+)\\]$',1),'\\\\]\\\\,\\', '\\\\]\\\\|\\|\\'),'\\|\\|')) list as a
    lateral view json_tuple(a,'ACode','AName','BList') ai as ACode
  , AName
  , BList 
  lateral view explode(split(regexp_replace(regexp_extract(BList,'^\\[(.+)\\]$',1),'\\\\,\\', '\\\\|\\|\\'),'\\|\\|')) BList as b 
  lateral view json_tuple(b,'BCode','BName') p as BCode
  , BName

以上是关于如何使用Hive解析多个嵌套的JSON数组的主要内容,如果未能解决你的问题,请参考以下文章

如何在Hive中解析嵌套的Json结构?

如何使用 Retrofit 解析嵌套/多个 Json 对象

如何使用json方法解析android中没有键名的数组内的嵌套json编码数组?

Hive解析Json数组超全讲解

Hive解析Json数组超全讲解

如何使用Java更新JSON文件中的嵌套JSON对象?