BigQuery联接基于(Array CONTAINED IN Array)条件的2个表

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了BigQuery联接基于(Array CONTAINED IN Array)条件的2个表相关的知识,希望对你有一定的参考价值。

我正在努力实现以下目标。假设我有两个表:

WITH table_1 as (
SELECT
* FROM UNNEST([
  STRUCT([1] as A, [2,3,4] as B),
  STRUCT([2],[6,7])
  ])
)

enter image description here

表2:

WITH example as (
SELECT
* FROM UNNEST([
  STRUCT([1,2] as C, [77] as D),
  STRUCT([3,4],[88]),
  STRUCT([4],[99])
  ])
)

enter image description here

我想基于以下条件合并table_1和table_2,即C的所有值都必须在B中:

SELECT A, C, D FROM table_1 LEFT JOIN table_2 ON C CONTAINED IN B

这将产生下表:

enter image description here

我的问题是,是否/如何能够获得预期的结果。我无法为两个数组编写CONTAINED IN语句作为LEFT JOIN语句的条件。另一个要求是表1包含1亿行,表2包含25,000行。因此,解决方案必须高效。我知道这增加了问题的难度...:P

非常感谢您的帮助!

答案
WITH table_1 as (
SELECT
* FROM UNNEST([
  STRUCT([1] as A, [2,3,4] as B),
  STRUCT([2],[6,7])
  ])
),
table_2 as (
SELECT
* FROM UNNEST([
  STRUCT([1,2] as C, [77] as D),
  STRUCT([3,4],[88]),
  STRUCT([4],[99])
  ])
)

SELECT table_1.A, table_2.C, table_2.D
FROM table_1 , table_2 , UNNEST([
                      (SELECT ARRAY_LENGTH(table_2.C) - COUNT(1) 
                      FROM UNNEST(table_2.C) AS col_c 
                      JOIN UNNEST(table_1.B)  AS col_b 
                      ON col_c = col_b)]) AS x
WHERE x = 0

以上是关于BigQuery联接基于(Array CONTAINED IN Array)条件的2个表的主要内容,如果未能解决你的问题,请参考以下文章

联接表结果 Google BigQuery

BigQuery 交叉联接失败

BigQuery 中的联接性能缓慢

BigQuery:使用子查询和内部联接的计数更新行

BigQuery 无法识别联接中子选择的字段

如何使用 Apache BEAM 在 BigQuery 中执行快速联接