如何在 Azure 流分析中展平嵌套的 json 数据

Posted

技术标签:

【中文标题】如何在 Azure 流分析中展平嵌套的 json 数据【英文标题】:How to flatten nested json data in Azure stream analytics 【发布时间】:2020-12-24 13:37:54 【问题描述】:

我在编写查询以从 JSON 文件的数组中提取表时遇到问题。 我想展平三个数组,即 case_Time、details 和其他数组,并将它们全部放在一个普通的 SQL 表中。

示例 JSON 数据:


    "case_Time": [
        
            "v1": "1",
            "v2": "0",
            "v3": "0",
            "date": "30 January ",
            "dateymd": "2020-01-30",
            "v4": "1",
            "v5": "0",
            "v6": "0"
        ,
        
            "v1": "1",
            "v2": "0",
            "v3": "0",
            "date": "31 January ",
            "dateymd": "2020-01-31",
            "v4": "1",
            "v5": "0",
            "v6": "0"
        ],
      "details": [
        
            "d1": "281844",
            "d2": "10124024",
            "d3": "146791",
            "d4": "0",
            "d5": "0",
            "d6": "0",
            "lastupdatedtime": "24/12/2020 09:12:24",
            "d7": "2746",
            "d8": "9692643",
            "d9": "Total",           
            "notes": "some text"
        ,
        
            "d1": "281944",
            "d2": "1012",
            "d3": "1791",
            "d4": "0",
            "d5": "0",
            "d6": "0",
            "lastupdatedtime": "25/12/2020 09:12:24",
            "d7": "2746",
            "d8": "96643",
            "d9": "Total",           
            "notes": "some text"
        ],
    "others": [
        
            "p1": "",
            "p2": "75.64",
            "p3": "",
            "p4": "",
            "p5": "",
            "p6": "",
            "date": "13/03/2020",
            "p7": "",
            "p8": "1.20%",
            "p9": "",
            "p10": "83.33",
            "p11": "5",
            "p12": "5900",
            "p13": "78"
                    ,
        
             "p1": "",
            "p2": "75.64",
            "p3": "",
            "p4": "",
            "p5": "",
            "p6": "",
            "date": "14/03/2020",
            "p7": "",
            "p8": "1.20%",
            "p9": "",
            "p10": "81.33",
            "p11": "5",
            "p12": "500",
            "p13": "78"
        
]

我尝试了以下查询,但只获取第一个数组数据,如何展平剩余数组:

WITH Cases AS
(
   SELECT   
   arrayElement.ArrayIndex,  
   arrayElement.ArrayValue as av  
   FROM input as event  
   CROSS APPLY GetArrayElements(event.case_Time) AS arrayElement 
)
SELECT av.v1, av.v2, av.v3,av.date,av.dateymd, av.v4,av.v5,av.v6
INTO powerbi
FROM Cases

感谢任何帮助:)

【问题讨论】:

【参考方案1】:

你可以Cross APPLY你所有的数组,试试这样:

WITH Cases AS
    (
       SELECT   
       arrayElement.ArrayIndex as ai,  
       arrayElement.ArrayValue as av,
       y.ArrayIndex as yi,
       y.ArrayValue as dt,
       z.ArrayIndex as zi,
       z.ArrayValue as ot
       FROM input as event  
       CROSS APPLY GetArrayElements(event.case_Time) AS arrayElement
       CROSS APPLY GetArrayElements(event.details) AS y
       CROSS APPLY GetArrayElements(event.others) AS z
    )

SELECT av.v1, av.v2, av.v3,av.date,av.dateymd,av.v4,av.v5,av.v6,dt.d1,dt.d2,dt.d3,dt.d4,dt.d5,dt.d6,dt.lastupdatedtime,dt.d7,dt.d8,dt.d9,dt.notes,ot.p1,ot.p2,ot.p3,ot.p4,ot.p5,ot.p6,ot.p7,ot.p8,ot.p9,ot.p10,ot.p11,ot.p12,ot.p13,ot.date as tdate
FROM Cases
INTO powerbi

此查询将产生一个完整的叉积,因此您将获得 8 行。如果只想获取2行(对应索引),可以加Where ai = yi and yi = zi

【讨论】:

感谢我能够解压数组。

以上是关于如何在 Azure 流分析中展平嵌套的 json 数据的主要内容,如果未能解决你的问题,请参考以下文章

有没有办法在火花流中展平嵌套的 JSON?

使用 Azure Synapse pyspark 过滤器根据嵌套对象的数据类型展平嵌套的 json 对象

如何展平多级/嵌套 JSON?

如何在 Azure 流分析查询中检查 null Json 属性?

如何捕获来自事件中心的错误 json 记录到 azure 流分析

如何使用 pyspark 在 aws 胶水中展平嵌套 json 中的数组?