尝试读取 BigQuery 表中的 JSON 结构时出错

Posted

技术标签:

【中文标题】尝试读取 BigQuery 表中的 JSON 结构时出错【英文标题】:Getting Error when tried to read JSON structure in BigQuery table 【发布时间】:2020-01-20 22:17:14 【问题描述】:

以下是我尝试阅读的 JSON 和供您阅读的代码:

当它没有任何值时,我只是在阅读“fieldorders”部分时遇到问题。如果它没有任何结构,我仍然需要向它们显示空白值。我能够阅读其他几个没有任何问题的具有多个部分的对象。当我们有一个没有任何值的对象时,我有问题,如果我在该对象中找不到任何值,我只需放置空值。

Getting the below Error:

**Failed to coerce output value false to type ARRAY**

我用来读取数据的示例 JSON 文件:


  "projectnumber": "X.6001877",
  "operationnumber": "O.6001877.01",
  "opactivitynumber": "B.6001877.01.01",
  "jobtypes": null,
  "jobtypesinfo": [
    
      "jobtype": "CC-SERV",
      "jobgroup": "CPS-CC",
      "staticattributes": [
        
          "name": "OAJTOPT",
          "description": "OA Job Type OPT",
          "type": "Double",
          "value": 0.0,
          "uom": null
        
      ]
    
  ],
  "actualactivitystartdate": "2018-01-17T05:00:00",
  "actualactivityenddate": "2018-01-29T05:00:00",
  "serverdatetime": null,
  "ServerDateTime": "2019-01-20T16:36:48.106",
  "projectSettings": null,
  "customerContacts": null,
  "actualequipments": null,
  "welldetails": [
    
      "Number": "1-1IH",
      "Name": "XXXX 58-4X",
      "State": "PL",
      "Country": "Col",
      "Field": "LABCD",
      "Uwi": null,
      "Environment": "Land",
      "WellId": "0065",
      "Latitude": 3.8,
      "Longitude": -72.2,
      "Type": null,
      "WaterDepth": null,
      "WellPlaceholderId": null,
      "IsNonMasteredWell": false
    
  ],
  "lastopeventid": null,
  "personnelassignmentinfo": null,
  "status": null,
  "accountingunit": null,
  "erpsystem": "ITT",
  "CreatedDate": "2020-01-20T16:36:48.106",
  "CreatedBy": "ABCD11",
  "LastModifiedDate": "2020-01-20T16:36:48.106",
  "LastModifiedBy": "ABCD11",
  "Id": "A.6001877.01.01",
  "country": 
    "Code": "CO",
    "Name": "CoOOOOOO"
  ,
  "attributes": 
    "Attributes": [
      
        "AttributeName": "OAOPDXAS",
        "AttributeDescription": "Activity OPD",
        "DataType": "Integer",
        "UOMType": "Dimensionless",
        "BaseUnit": "",
        "IsCalculated": true,
        "Values": null
      ,
      
        "AttributeName": "OpActOPTime",
        "AttributeDescription": "OA Operating Time - OPT (HRS)",
        "DataType": "Float",
        "UOMType": "Dimensionless",
        "BaseUnit": "",
        "IsCalculated": true,
        "Values": null
      
    ],
    "DailyAttributes": null,
    "MultiAttributes": null,
    "Id": "A.6001877.01.01"
  ,
  "operationalevent": [
    
      "operatingevent": 
        "projectnumber": "C.6001877",
        "operationnumber": "O.6001877.01",
        "operationactivitynumber": "X.6001877.01.01",
        "operationaleventdetails": 
          "status": null,
          "description": "Non-Operational Event",
          "plannedeventid": null,
          "jobgroup": null,
          "jobtype": null,
          "startdatetime": "2020-01-18T05:00:00",
          "enddatetime": "2020-01-15T05:00:00",
          "comments": "Non-Operational Event",
          "eventtype": "Project",
          "isdeleted": false,
          "category": "NonOperational",
          "islocked": false,
          "lockedon": "0001-01-01T05:00:00",
          "lockedby": null,
          "audittrailinfo": 
            "CreatedDate": "2020-01-20T15:36:17.816",
            "CreatedBy": "ABCD11",
            "LastModifiedDate": "2020-01-20T15:36:17.816",
            "LastModifiedBy": "ABCD1111",
            "Id": null
          ,
          "personnel": 
            "assignment": [

            ]
          ,
          "serverdatetime": "2018-01-20T16:36:56.185",
          "equipmentdata": 
            "equipmentassignments": [

            ]
          ,
          "eventtypeattributes": null,
          "id": "E97A5DBC",
          "oesummary": null,
          "journal": null,
          "well": null,
          "isactive": true,
          "externaltransactionhistoryinfo": [
            
              "status": "Pending",
              "message": null,
              "type": "MPT",
              "riteservicereporturl": null,
              "CreatedDate": "0001-01-01T00:00:00",
              "CreatedBy": null,
              "LastModifiedDate": "0001-01-01T00:00:00",
              "LastModifiedBy": null,
              "Id": null
            
          ],
          "pnmconsumptiondata": 
            "pnmconsumptions": [

            ]
          
        ,
        "CreatedDate": "2018-01-20T16:36:56.185",
        "CreatedBy": "ABCD11",
        "LastModifiedDate": "2020-01-20T16:36:56.185",
        "LastModifiedBy": "ABCD11",
        "Id": "A.6001877.01.01_OperationalEvent_E97A5DBC"
      ,
      "attributes": null
    
  ],
  "attendance": [

  ],
  **"fieldorders": [

  ]**

BigQuery SQL 代码:

    CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
    RETURNS ARRAY<STRING>
    LANGUAGE js AS """
            return jsonPath(JSON.parse(json), json_path);
    """
    OPTIONS (
        library="gs://json_temp/jsonpath-0.8.0.js"
    );

    SELECT job_id,oe_descr,
    attr_name,
    well_name,
    job_type,
    --field_id

    from lz.json_actuals,
    UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.operationalevent[*].operatingevent.operationaleventdetails.description')) oe_descr  with offset oedescr,
    UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.attributes.Attributes[*].AttributeName')) attr_name with offset attrb,
    UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.welldetails[*].Name')) Well_name with offset wll,
    UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.jobtypesinfo[*].jobtype')) job_type with offset jt,
    --UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.fieldorders[*].id')) field_id WITH OFFSET fld

  ;

【问题讨论】:

只是说您无法针对您的具体案例采用其他解决方案并没有帮助 - 您应该提供所有相关详细信息! 非常感谢您的回复米哈伊尔。我将举例说明它在哪里起作用,在哪里不起作用。 你应该用你想问的问题的所有细节来更新你的问题。而不是发布图像,您应该发布文本,以便我们可以使用您的数据并重现您的用例,最重要的是能够帮助您! @MikhailBerlyant - 我尝试以文本形式发布我的问题,并提供了在 BigQuery 中执行的示例 json 和代码。看起来版主正在删除它们,我不确定我是否违反了这里的任何政策。请请求您的帮助来指导我 不要发布您的问题的答案 - 您应该更新您的问题 【参考方案1】:

下面是 BigQuery 标准 SQL,应该可以解决您的空对象问题

#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<string>
LANGUAGE js AS """
  var result = jsonPath(JSON.parse(json), json_path);
  if(result)return result; 
  else return [];
"""
OPTIONS (
    library="gs://json_temp/jsonpath-0.8.0.js"
);
SELECT --job_id,
  oe_descr,
  attr_name,
  well_name,
  job_type,
  field_id
from `lz.json_actuals`,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.operationalevent[*].operatingevent.operationaleventdetails.description')) oe_descr  with offset oedescr,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.attributes.Attributes[*].AttributeName')) attr_name with offset attrb,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.welldetails[*].Name')) Well_name with offset wll,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.jobtypesinfo[*].jobtype')) job_type with offset jt
LEFT JOIN UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.fieldorders[*].id')) field_id WITH OFFSET fld   

如果应用于您问题中的样本数据 - 结果是

Row oe_descr                    attr_name   well_name   job_type    field_id     
1   Non-Operational Event       OAOPDXAS    XXXX 58-4X  CC-SERV     null     
2   Non-Operational Event       OpActOPTime XXXX 58-4X  CC-SERV     null      

【讨论】:

非常感谢您在这方面的帮助。这解决了我的问题。我对所有字段都使用了 LEFT JOIN,因为我们可能会遇到任何列的空白数组并对其进行测试,它工作正常

以上是关于尝试读取 BigQuery 表中的 JSON 结构时出错的主要内容,如果未能解决你的问题,请参考以下文章

将谷歌云存储中的 json 文件加载到 bigquery 表中

代码在读取 JSON 时返回错误,并且 BigQuery SQL 的 JSON 结构中不存在引用的部分

使用 BigQuery 读取 JSON 结构时重复数据

解析列中具有动态键的 JSON 值并将 JSON 转换为 BigQuery 中的记录列结构

如何从基于嵌套 json 的 BigQuery 表中进行选择?

是否可以使用 UDF 从 BigQuery 读取 gcs 对象的元数据