BigQuery MERGE 查询不适用于空的 REPEATED 字段

Posted

技术标签:

【中文标题】BigQuery MERGE 查询不适用于空的 REPEATED 字段【英文标题】:BigQuery MERGE query not working with empty REPEATED field 【发布时间】:2021-02-04 06:53:20 【问题描述】:

我正在尝试对具有以下架构的表运行 MERGE 查询:

[
   
      "mode":"REQUIRED",
      "name":"container_ref",
      "type":"STRING"
   ,
   
      "mode":"REQUIRED",
      "name":"receive_date",
      "type":"DATE"
   ,
   
      "mode":"REQUIRED",
      "name":"vendor_id",
      "type":"INTEGER"
   ,
   
      "fields":[
         
            "mode":"NULLABLE",
            "name":"code",
            "type":"STRING"
         ,
         
            "mode":"NULLABLE",
            "name":"quantity",
            "type":"INTEGER"
         
      ],
      "mode":"REPEATED",
      "name":"products",
      "type":"RECORD"
   ,
   
      "mode":"REQUIRED",
      "name":"max_fill_ratio",
      "type":"FLOAT"
   
]

我正在尝试运行以下查询:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , [] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

如您所见,在源数据中,products 是一个空数组。我从查询中收到以下错误:

ARRAY 类型的值不能分配给 T.products,它有 type ARRAY>

如果在 products 未设置为空数组的情况下运行相同的查询,则它可以正常工作。例如:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , [STRUCT('01757' AS code, 10 AS quantity),STRUCT('03831' AS code, 20 AS quantity)] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

我无法弄清楚为什么MERGE 查询中的空数组会导致此错误。我已经尝试使用产品的空数组插入和更新表格,它工作正常。例如,这个查询可以正常工作:

INSERT INTO `project.dataset.table`
(container_ref, receive_date, vendor_id, max_fill_ratio, products)
VALUES (
  'yongkang_210222'
  , DATE('2021-02-22')
  , 51695
  , 0.5
  , []
)

这开始感觉像是一个 BigQuery 错误。有人知道为什么会这样吗?

【问题讨论】:

【参考方案1】:

我不知道为什么,但是在数组之前添加ARRAY<STRUCT<code STRING, quantity INT64>> 已经解决了这个问题:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , ARRAY<STRUCT<code STRING, quantity INT64>> [] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

现在它对于空数组和非空数组都可以正常工作。如果有人可以提供解释,我很想了解为什么这是必要的。

【讨论】:

以上是关于BigQuery MERGE 查询不适用于空的 REPEATED 字段的主要内容,如果未能解决你的问题,请参考以下文章

Android 数据绑定不适用于 <merge> 属性

BigQuery - 时间戳偏移不适用于 insertAll / Streaming?

BigQuery 自动检测不适用于不一致的 json?

JSON_EXTRACT 不适用于 BigQuery 中的布尔值

Bigquery 查询返回“pagetoken missing” - 查询适用于其他表

BigQuery 中的 MERGE 语法是扫描整个表吗?