从 Azure SQL 中非常嵌套的 JSON 中提取数据

Posted

技术标签:

【中文标题】从 Azure SQL 中非常嵌套的 JSON 中提取数据【英文标题】:Extracting data from very nested JSON in Azure SQL 【发布时间】:2020-09-05 19:27:59 【问题描述】:

我有嵌套的 JSON 文件:

"vehicleStatusResponse": 
    "vehicleStatuses": [
        
            "vin": "ABC1234567890",
            "triggerType": 
                "triggerType": "TIMER",
                "context": "RFMS",
                "driverId": 
                    "tachoDriverIdentification": 
                        "driverIdentification": "123456789",
                        "cardIssuingMemberState": "BRA",
                        "driverAuthenticationEquipment": "CARD",
                        "cardReplacementIndex": "0",
                        "cardRenewalIndex": "1"
                    
                
            ,
            "receivedDateTime": "2020-02-12T04:11:19.221Z",
            "hrTotalVehicleDistance": 103306960,
            "totalEngineHours": 3966.6216666666664,
            "driver1Id": 
                "tachoDriverIdentification": 
                    "driverIdentification": "BRA1234567"
                
            ,
            "engineTotalFuelUsed": 48477520,
            "accumulatedData": 
                "durationWheelbaseSpeedOverZero": 8309713,
                "distanceCruiseControlActive": 8612200,
                "durationCruiseControlActive": 366083,
                "fuelConsumptionDuringCruiseActive": 3064170,
                "durationWheelbaseSpeedZero": 5425783,
                "fuelWheelbaseSpeedZero": 3332540,
                "fuelWheelbaseSpeedOverZero": 44709670,
                "ptoActiveClass": [
                    
                        "label": "wheelbased speed >0",
                        "seconds": 16610,
                        "meters": 29050,
                        "milliLitres": 26310
                    ,
                    
                        "label": "wheelbased speed =0",
                        "seconds": 457344,
                        "milliLitres": 363350

它已经从 Azure BLOB 存储导入到 SQL DB,现在我需要从中提取数据到表中。我已经使用了 T-SQL 请求来执行此操作,但它返回了只有标题的空白表:

SELECT response.*
    FROM OPENROWSET (BULK 'response3.json', DATA_SOURCE = 'VCBI24', SINGLE_CLOB) as j
    CROSS APPLY OPENJSON(BulkColumn)
    WITH ( vehiclestatusResponse nvarchar (100), vehicleStatuses nvarchar (100), vin nvarchar (100), triggerType nvarchar (100), context nvarchar (100) and etc...) AS response 

我该如何处理?

非常感谢您的关注!

【问题讨论】:

嗨@hatorihanso - 这有什么更新吗?如果这有帮助,请点赞和/或标记为已回答。 【参考方案1】:

您可以提供带有OPENJSON 的路径,它允许您钻入嵌套的 JSON,例如

SELECT *
FROM OPENJSON( @json, '$.vehicleStatusResponse.vehicleStatuses' )
WITH (
    vin VARCHAR(50)  '$.vin',
    triggerType VARCHAR(50)  '$.triggerType.triggerType',
    context VARCHAR(50)  '$.triggerType.context',
    driverIdentification VARCHAR(50)     '$.triggerType.driverId.tachoDriverIdentification.driverIdentification',
    cardIssuingMemberState VARCHAR(50)   '$.triggerType.driverId.tachoDriverIdentification.cardIssuingMemberState',

    receivedDateTime DATETIME    '$.receivedDateTime',

    engineTotalFuelUsed INT  '$.engineTotalFuelUsed'

    )

完整的脚本示例:

DECLARE @json VARCHAR(MAX) = '
"vehicleStatusResponse": 
  "vehicleStatuses": [
    
      "vin": "ABC1234567890",
      "triggerType": 
        "triggerType": "TIMER",
        "context": "RFMS",
        "driverId": 
          "tachoDriverIdentification": 
            "driverIdentification": "123456789",
            "cardIssuingMemberState": "BRA",
            "driverAuthenticationEquipment": "CARD",
            "cardReplacementIndex": "0",
            "cardRenewalIndex": "1"
          
        
      ,
      "receivedDateTime": "2020-02-12T04:11:19.221Z",
      "hrTotalVehicleDistance": 103306960,
      "totalEngineHours": 3966.6216666666664,
      "driver1Id": 
        "tachoDriverIdentification": 
          "driverIdentification": "BRA1234567"
        
      ,
      "engineTotalFuelUsed": 48477520,
      "accumulatedData": 
        "durationWheelbaseSpeedOverZero": 8309713,
        "distanceCruiseControlActive": 8612200,
        "durationCruiseControlActive": 366083,
        "fuelConsumptionDuringCruiseActive": 3064170,
        "durationWheelbaseSpeedZero": 5425783,
        "fuelWheelbaseSpeedZero": 3332540,
        "fuelWheelbaseSpeedOverZero": 44709670,
        "ptoActiveClass": [
          
            "label": "wheelbased speed >0",
            "seconds": 16610,
            "meters": 29050,
            "milliLitres": 26310
          ,
          
            "label": "wheelbased speed =0",
            "seconds": 457344,
            "milliLitres": 363350
          
        ]
      
    
  ]
'


SELECT *
FROM OPENJSON( @json, '$.vehicleStatusResponse.vehicleStatuses' )
WITH (
    vin VARCHAR(50)  '$.vin',
    triggerType VARCHAR(50)  '$.triggerType.triggerType',
    context VARCHAR(50)  '$.triggerType.context',
    driverIdentification VARCHAR(50)     '$.triggerType.driverId.tachoDriverIdentification.driverIdentification',
    cardIssuingMemberState VARCHAR(50)   '$.triggerType.driverId.tachoDriverIdentification.cardIssuingMemberState',

    receivedDateTime DATETIME    '$.receivedDateTime',

    engineTotalFuelUsed INT  '$.engineTotalFuelUsed'


    )

我的结果:

在此处阅读有关OPENJSON 的更多信息:

https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-ver15

第二个例子。使用 OPENJSON,您可以提供更明确的 json 路径表达式,并为您提供更多控制,特别是对于嵌套 JSON。如果 JSON 比较简单,则不必提供路径,例如

DECLARE @json VARCHAR(MAX) = '
  "ptoActiveClass": [
    
      "label": "wheelbased speed >0",
      "seconds": 16610,
      "meters": 29050,
      "milliLitres": 26310
    ,
    
      "label": "wheelbased speed =0",
      "seconds": 457344,
      "milliLitres": 363350
    
  ]
'


SELECT *
FROM OPENJSON( @json, '$.ptoActiveClass' )
WITH (
    label VARCHAR(50),
    seconds INT,
    meters INT,
    milliLitres INT
    )

SELECT *
FROM OPENJSON( @json, '$.ptoActiveClass' )
WITH (
    label VARCHAR(50)    '$.label',
    seconds VARCHAR(50)  '$.seconds',
    meters VARCHAR(50)   '$.meters',
    milliLitres VARCHAR(50)  '$.milliLitres'
    )

【讨论】:

嗨@wBob!感谢您的回答,但我需要从已经位于 BLOB-storage 中的 json 文件中提取数据。无需单独声明...尝试采用您的脚本。 是的,这只是为了展示技术。我喜欢我的所有脚本端到端运行,显然我无权访问您的 blob 存储! :) 嗨@wBob!请您解释一下如何在下面提取字符串?在 MicrosoftDocs 链接中找不到任何信息:“ptoActiveClass”:[“label”:“wheelbased speed >0”,“seconds”:16610,“meters”:29050,“milliLitres”:26310 ,“label” : "wheelbased speed =0", "seconds": 457344, "milliLitres": 363350 ] 我将添加另一个示例来涵盖这一点。我可以先检查一下您是否知道如何对有帮助的问题进行投票,以及您是否知道如何mark them as answered? 如果你能解释一下,那就太棒了;)是的,我知道怎么做。

以上是关于从 Azure SQL 中非常嵌套的 JSON 中提取数据的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Azure 流分析中展平嵌套的 json 数据

Azure 数据流:从 JSON 字符串解析对象的嵌套列表

如何更改 DataFrame 的架构(修复一些嵌套字段的名称)?

如何从 Azure SQL 数据库中的 Blob 解析 Json

SQL:从 SQL Server 中的嵌套 JSON 中查找最大值

如何从cassandra中非常大的表中读取所有行?