将 JSON 加载到 BigQuery:字段有时是数组,有时是字符串

Posted

技术标签:

【中文标题】将 JSON 加载到 BigQuery:字段有时是数组,有时是字符串【英文标题】:Loading JSON into BigQuery: Field is sometimes an array and sometimes a string 【发布时间】:2020-12-30 16:05:49 【问题描述】:

我正在尝试将 JSON 数据加载到 BigQuery。我的数据导致问题的摘录如下所示:

 ["Value":"123","Code":"A","Value":"000","Code":"B"]
 "Value":"456","Code":"A"
 ["Value":"123","Code":"A","Value":"789","Code":"C","Value":"000","Code":"B"]
 "Value":"Z","Code":"A"

我已将此字段的架构定义为:

  
    "fields": [
      
        "mode": "NULLABLE",
        "name": "Code",
        "type": "STRING"
      ,
      
        "mode": "NULLABLE",
        "name": "Value",
        "type": "STRING"
      
    ],
    "mode": "REPEATED",
    "name": "Properties",
    "type": "RECORD"
  

但我无法成功地将字符串和数组值提取到一个重复字段中。此 SQL 将成功提取字符串值:

JSON_EXTRACT_SCALAR(json_string,'$.Properties.Code') as Code,
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Value') as Value

并且这条 SQL 将成功提取数组值:

  ARRAY(
    SELECT
      STRUCT(
        JSON_EXTRACT_SCALAR(Properties_Array,'$.Code') AS Code,
        JSON_EXTRACT_SCALAR(Properties_Array,'$.Value') AS Value
      )
    FROM UNNEST(JSON_EXTRACT_ARRAY(json_string,'$.Properties')) Properties_Array)
  AS Properties

我正在尝试找到一种方法让 BigQuery 将此字符串作为单元素数组读取,而不是对数据进行预处理。这在#StandardSQL 中可行吗?

【问题讨论】:

【参考方案1】:

以下示例适用于 BigQuery 标准 SQL

#standardSQL
WITH `project.dataset.table` as (
  SELECT '"Properties":["Value":"123","Code":"A","Value":"000","Code":"B"]' json_string UNION ALL
  SELECT '"Properties":"Value":"456","Code":"A"' UNION ALL
  SELECT '"Properties":["Value":"123","Code":"A","Value":"789","Code":"C","Value":"000","Code":"B"]' UNION ALL
  SELECT '"Properties": "Value":"Z","Code":"A"'  
)
SELECT json_string, 
  ARRAY(
    SELECT STRUCT(
        JSON_EXTRACT_SCALAR(Properties,'$.Code') AS Code,
        JSON_EXTRACT_SCALAR(Properties,'$.Value') AS Value
      )
    FROM UNNEST(IFNULL(
      JSON_EXTRACT_ARRAY(json_string,'$.Properties'), 
      [JSON_EXTRACT(json_string,'$.Properties')])) Properties
  ) AS Properties  
FROM `project.dataset.table`      

有输出

【讨论】:

非常感谢@mikhail-berlyant!像魅力一样工作。我尝试了几种 UDF、CASE 语句和 CTE 的组合,并预感到解决方案将涉及检查空数组。

以上是关于将 JSON 加载到 BigQuery:字段有时是数组,有时是字符串的主要内容,如果未能解决你的问题,请参考以下文章

使用空字典作为值将 JSON 文件加载到 BigQuery

将 bigquery JSON 数据转储加载到 R tibble

BigQuery - 加载具有空值的 JSON 字段

BigQuery 加载 JSON 文件:如何忽略或重命名字段?

BigQuery:将 JSON 对象加载为字符串

BigQuery 加载作业在 JSON 中的布尔数据类型字段上失败