在 BigQuery 表中展平多个相同大小的数组列

Posted

技术标签:

【中文标题】在 BigQuery 表中展平多个相同大小的数组列【英文标题】:Flatten multiple same-sized array columns in BigQuery table 【发布时间】:2020-03-01 17:33:20 【问题描述】:

我有一个包含几列的表,其中一些列是长度相同的数组。我想解除它们的嵌套,以获得一个结果,其中包含来自不同行的数组中的值。

所以有这样一张桌子:

我想去:

这是其中一个数组列的工作方式:

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1
FROM data,
UNNEST(array_1) as a1

有没有一些优雅的方法可以同时解除两个数组的嵌套?我想避免单独取消嵌套每一列,然后将所有内容连接在一起。

【问题讨论】:

【参考方案1】:

您可以使用with offsetjoin

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1, a2
FROM data cross join
     UNNEST(array_1) as a1 with offset n1 JOIN
     UNNEST(array_2) as a2 with offset n2 
     on n1 = n2

【讨论】:

【参考方案2】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT id, a1, a2
FROM data, UNNEST(array_1) AS a1 WITH OFFSET 
JOIN UNNEST(array_2) AS a2 WITH OFFSET
USING(OFFSET)

【讨论】:

【参考方案3】:

所以我自己做了一些关于在 SQL 中取消嵌套的研究,并提出了这个解决方案:

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1, array_2[OFFSET(off)] AS a2
FROM data
CROSS JOIN UNNEST(array_1) AS a1 WITH OFFSET off

优点是它不需要取消嵌套所有数组,只需要一个。

【讨论】:

谢谢,这解决了我遇到的类似问题。您能否指向文档以阅读有关 offset 关键字的更多信息?我似乎不明白它在这种情况下是如何工作的。

以上是关于在 BigQuery 表中展平多个相同大小的数组列的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery SQL 中跨多个字段的拆分函数

BigQuery - 如何取消嵌套多个数组,并从一列分配值?

Bigquery:UNNEST 重复与展平表性能

在 Google BigQuery 中展平多个重复字段

在 BigQuery 中,带有空值数组列的“where”子句导致问题

如何关联多个 BigQuery 数组字段?