如何在 Google BigQuery 中的数组中的元素上创建表连接
Posted
技术标签:
【中文标题】如何在 Google BigQuery 中的数组中的元素上创建表连接【英文标题】:how to create a table join on elements in an Array in Google BigQuery 【发布时间】:2020-08-22 13:51:20 【问题描述】:我有一些数据,contact_ID,它们位于一个名为 Deals 的表内的结构数组中,如下例所示。
WITH deals AS (
Select "012345" as deal_ID,
[STRUCT(["abc"] as company_ID, [123,678,810] as contact_ID)]
AS associations)
SELECT
deal_ID,
ARRAY(
SELECT AS STRUCT
( SELECT STRING_AGG(CAST(id AS STRING), ', ')
FROM t.contact_ID id
) AS contact_ID
FROM d.associations t
) AS contacts
FROM deals d
上面的查询获取关联数组中的contact_ID,并用逗号分隔。
Row deal_ID contacts.contact_ID
1 012345 123, 678, 810
但我现在的问题是我需要将contact_IDs替换为另一个名为contacts的表中的名字和姓氏,如下所示,其中contact_ID是INT64,名称字段是字符串。
contact_id first_name last_name
123 Jane Doe
678 John Smith
810 Alice Acre
我尝试过使用这样的子查询:
WITH deals AS (
Select "012345" as deal_ID,
[STRUCT(["abc"] as company_ID, [123,678,810] as contact_ID)]
AS associations)
SELECT
deal_ID,
ARRAY(
SELECT AS STRUCT
company_ID,
( SELECT STRING_AGG(
(select concat(c.first_name, " ", c.last_name)
from contacts c
where c.contact_id=id), ', ')
FROM t.contact_ID id
) AS contact_name
FROM d.associations t
) AS contacts
FROM deals d
但这会产生错误“不支持引用其他表的相关子查询,除非它们可以去相关,例如通过将它们转换为有效的 JOIN。”但是当我需要加入的东西在 deals.associations 数组中时,我不知道如何在 deal.associations.contact_ID 和 contacts.contact_id 之间建立连接...... 提前感谢您的任何指导。
【问题讨论】:
【参考方案1】:以下是 BigQuery 标准 SQL
#standardSQL
SELECT deal_ID,
ARRAY_AGG(STRUCT(company_ID, contact_name)) AS contacts
FROM (
SELECT
deal_ID,
ANY_VALUE(company_ID) AS company_ID,
STRING_AGG(FORMAT('%s %s', IFNULL(first_name, ''), IFNULL(last_name, '')), ', ') AS contact_name
FROM deals d,
d.associations AS contact,
contact.contact_ID AS contact_ID
LEFT JOIN contacts c
USING(contact_ID)
GROUP BY deal_ID, FORMAT('%t', company_ID)
)
GROUP BY deal_ID
如果应用于您问题的样本数据 - 输出是
Row deal_ID contacts.company_ID contacts.contact_name
1 012345 abc Jane Doe, John Smith, Alice Acre
注意 - 下面
FROM deals d,
d.associations AS contact,
contact.contact_ID AS contact_ID
是一个快捷方式
FROM deals,
UNNEST(associations) AS contact,
UNNEST(contact_ID) AS contact_ID
不知何故 - 这是我的偏好,尽可能不在查询文本中使用显式 UNNEST()
【讨论】:
以上是关于如何在 Google BigQuery 中的数组中的元素上创建表连接的主要内容,如果未能解决你的问题,请参考以下文章
从 Google BigQuery 标准 SQL 中的数组生成随机值
从 Google BigQuery 中的选择中排除数组类型字段