尝试读取 BigQuery 表中的 JSON 结构时出错
Posted
技术标签:
【中文标题】尝试读取 BigQuery 表中的 JSON 结构时出错【英文标题】:Getting Error when tried to read JSON structure in BigQuery table 【发布时间】:2020-01-20 22:17:14 【问题描述】:以下是我尝试阅读的 JSON 和供您阅读的代码:
当它没有任何值时,我只是在阅读“fieldorders”部分时遇到问题。如果它没有任何结构,我仍然需要向它们显示空白值。我能够阅读其他几个没有任何问题的具有多个部分的对象。当我们有一个没有任何值的对象时,我有问题,如果我在该对象中找不到任何值,我只需放置空值。
Getting the below Error:
**Failed to coerce output value false to type ARRAY**
我用来读取数据的示例 JSON 文件:
"projectnumber": "X.6001877",
"operationnumber": "O.6001877.01",
"opactivitynumber": "B.6001877.01.01",
"jobtypes": null,
"jobtypesinfo": [
"jobtype": "CC-SERV",
"jobgroup": "CPS-CC",
"staticattributes": [
"name": "OAJTOPT",
"description": "OA Job Type OPT",
"type": "Double",
"value": 0.0,
"uom": null
]
],
"actualactivitystartdate": "2018-01-17T05:00:00",
"actualactivityenddate": "2018-01-29T05:00:00",
"serverdatetime": null,
"ServerDateTime": "2019-01-20T16:36:48.106",
"projectSettings": null,
"customerContacts": null,
"actualequipments": null,
"welldetails": [
"Number": "1-1IH",
"Name": "XXXX 58-4X",
"State": "PL",
"Country": "Col",
"Field": "LABCD",
"Uwi": null,
"Environment": "Land",
"WellId": "0065",
"Latitude": 3.8,
"Longitude": -72.2,
"Type": null,
"WaterDepth": null,
"WellPlaceholderId": null,
"IsNonMasteredWell": false
],
"lastopeventid": null,
"personnelassignmentinfo": null,
"status": null,
"accountingunit": null,
"erpsystem": "ITT",
"CreatedDate": "2020-01-20T16:36:48.106",
"CreatedBy": "ABCD11",
"LastModifiedDate": "2020-01-20T16:36:48.106",
"LastModifiedBy": "ABCD11",
"Id": "A.6001877.01.01",
"country":
"Code": "CO",
"Name": "CoOOOOOO"
,
"attributes":
"Attributes": [
"AttributeName": "OAOPDXAS",
"AttributeDescription": "Activity OPD",
"DataType": "Integer",
"UOMType": "Dimensionless",
"BaseUnit": "",
"IsCalculated": true,
"Values": null
,
"AttributeName": "OpActOPTime",
"AttributeDescription": "OA Operating Time - OPT (HRS)",
"DataType": "Float",
"UOMType": "Dimensionless",
"BaseUnit": "",
"IsCalculated": true,
"Values": null
],
"DailyAttributes": null,
"MultiAttributes": null,
"Id": "A.6001877.01.01"
,
"operationalevent": [
"operatingevent":
"projectnumber": "C.6001877",
"operationnumber": "O.6001877.01",
"operationactivitynumber": "X.6001877.01.01",
"operationaleventdetails":
"status": null,
"description": "Non-Operational Event",
"plannedeventid": null,
"jobgroup": null,
"jobtype": null,
"startdatetime": "2020-01-18T05:00:00",
"enddatetime": "2020-01-15T05:00:00",
"comments": "Non-Operational Event",
"eventtype": "Project",
"isdeleted": false,
"category": "NonOperational",
"islocked": false,
"lockedon": "0001-01-01T05:00:00",
"lockedby": null,
"audittrailinfo":
"CreatedDate": "2020-01-20T15:36:17.816",
"CreatedBy": "ABCD11",
"LastModifiedDate": "2020-01-20T15:36:17.816",
"LastModifiedBy": "ABCD1111",
"Id": null
,
"personnel":
"assignment": [
]
,
"serverdatetime": "2018-01-20T16:36:56.185",
"equipmentdata":
"equipmentassignments": [
]
,
"eventtypeattributes": null,
"id": "E97A5DBC",
"oesummary": null,
"journal": null,
"well": null,
"isactive": true,
"externaltransactionhistoryinfo": [
"status": "Pending",
"message": null,
"type": "MPT",
"riteservicereporturl": null,
"CreatedDate": "0001-01-01T00:00:00",
"CreatedBy": null,
"LastModifiedDate": "0001-01-01T00:00:00",
"LastModifiedBy": null,
"Id": null
],
"pnmconsumptiondata":
"pnmconsumptions": [
]
,
"CreatedDate": "2018-01-20T16:36:56.185",
"CreatedBy": "ABCD11",
"LastModifiedDate": "2020-01-20T16:36:56.185",
"LastModifiedBy": "ABCD11",
"Id": "A.6001877.01.01_OperationalEvent_E97A5DBC"
,
"attributes": null
],
"attendance": [
],
**"fieldorders": [
]**
BigQuery SQL 代码:
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
library="gs://json_temp/jsonpath-0.8.0.js"
);
SELECT job_id,oe_descr,
attr_name,
well_name,
job_type,
--field_id
from lz.json_actuals,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.operationalevent[*].operatingevent.operationaleventdetails.description')) oe_descr with offset oedescr,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.attributes.Attributes[*].AttributeName')) attr_name with offset attrb,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.welldetails[*].Name')) Well_name with offset wll,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.jobtypesinfo[*].jobtype')) job_type with offset jt,
--UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.fieldorders[*].id')) field_id WITH OFFSET fld
;
【问题讨论】:
只是说您无法针对您的具体案例采用其他解决方案并没有帮助 - 您应该提供所有相关详细信息! 非常感谢您的回复米哈伊尔。我将举例说明它在哪里起作用,在哪里不起作用。 你应该用你想问的问题的所有细节来更新你的问题。而不是发布图像,您应该发布文本,以便我们可以使用您的数据并重现您的用例,最重要的是能够帮助您! @MikhailBerlyant - 我尝试以文本形式发布我的问题,并提供了在 BigQuery 中执行的示例 json 和代码。看起来版主正在删除它们,我不确定我是否违反了这里的任何政策。请请求您的帮助来指导我 不要发布您的问题的答案 - 您应该更新您的问题 【参考方案1】:下面是 BigQuery 标准 SQL,应该可以解决您的空对象问题
#standardSQL
CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<string>
LANGUAGE js AS """
var result = jsonPath(JSON.parse(json), json_path);
if(result)return result;
else return [];
"""
OPTIONS (
library="gs://json_temp/jsonpath-0.8.0.js"
);
SELECT --job_id,
oe_descr,
attr_name,
well_name,
job_type,
field_id
from `lz.json_actuals`,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.operationalevent[*].operatingevent.operationaleventdetails.description')) oe_descr with offset oedescr,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.attributes.Attributes[*].AttributeName')) attr_name with offset attrb,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.welldetails[*].Name')) Well_name with offset wll,
UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.jobtypesinfo[*].jobtype')) job_type with offset jt
LEFT JOIN UNNEST(CUSTOM_JSON_EXTRACT(conv_column, '$.fieldorders[*].id')) field_id WITH OFFSET fld
如果应用于您问题中的样本数据 - 结果是
Row oe_descr attr_name well_name job_type field_id
1 Non-Operational Event OAOPDXAS XXXX 58-4X CC-SERV null
2 Non-Operational Event OpActOPTime XXXX 58-4X CC-SERV null
【讨论】:
非常感谢您在这方面的帮助。这解决了我的问题。我对所有字段都使用了 LEFT JOIN,因为我们可能会遇到任何列的空白数组并对其进行测试,它工作正常以上是关于尝试读取 BigQuery 表中的 JSON 结构时出错的主要内容,如果未能解决你的问题,请参考以下文章
将谷歌云存储中的 json 文件加载到 bigquery 表中
代码在读取 JSON 时返回错误,并且 BigQuery SQL 的 JSON 结构中不存在引用的部分
解析列中具有动态键的 JSON 值并将 JSON 转换为 BigQuery 中的记录列结构