BigQuery LEFT JOIN 一个表并根据条件过滤其数组元素
Posted
技术标签:
【中文标题】BigQuery LEFT JOIN 一个表并根据条件过滤其数组元素【英文标题】:BigQuery LEFT JOIN a table and filter its array elements based on conditions 【发布时间】:2019-12-06 13:33:22 【问题描述】:我想将一个表连接到另一个包含数组的表,并且在连接的结果中我只想拥有通过条件的数组元素。在这种情况下,日期条件。
下面的代码 sn-p 说明了我的问题。我希望输出仅包含 ids
和 record_dates
小于 '2019-10-15'
WITH platform AS (
SELECT 'u1' AS id, 'm1' AS platform_id, '2019-10-12' as record_date
UNION ALL
SELECT 'u2' AS id, 'm1' AS platform_id, '2019-10-13' as record_date
UNION ALL
SELECT 'u21' AS id, 'm1' AS platform_id, '2019-10-16' as record_date
),
platform_agg AS (
SELECT platform_id
, ARRAY_AGG(id) as ids
, ARRAY_AGG(record_date) as record_dates
FROM platform
GROUP BY platform_id
),
orders AS(
SELECT 'u2' AS id, 'c1' AS order_id, '2019-10-15' as order_date
),
orders_plus_platform AS (
SELECT order_id
, orders.id
, orders.order_date
, platform.platform_id
, CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.ids ELSE [orders.id] END AS ids
, CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.record_dates ELSE NULL END AS record_dates
FROM orders
LEFT JOIN platform
ON orders.id = platform.id and platform.record_date <= orders.order_date
LEFT JOIN platform_agg
ON platform.platform_id = platform_agg.platform_id
)
SELECT * FROM orders_plus_platform
以下是当前查询输出,但是,在所需输出中,u21
元素应被过滤掉,因为 record_date 在“2019-10-15”之后。
谢谢你,
【问题讨论】:
理想情况下,您应该缩小问题范围,并以简洁易读的方式呈现,以便我们能够有效地帮助您。您提出的查询有太多噪音,很可能与您的问题无关,但对我们隐藏了 - 请考虑重新审视/改进您的问题 感谢您的反馈。你是对的,特别是查询应该更简洁。我做了一些编辑,我希望这些问题现在更容易理解。谢谢! 仍有很多方法可以对您的期望进行逆向工程 - 相反 - 您能否明确显示预期结果,所以我们不是在猜测 【参考方案1】:以下解决方案对我有用。基本上,您将两次加入平台表以获取与平台关联的所有 id,而不是加入平台的预聚合版本。这样,您可以更轻松地应用过滤器。
orders_plus_platform AS (
SELECT order_id
, orders.id
, orders.order_date
, platform.platform_id
, ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.id ELSE orders.id END) AS ids
, ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.record_date ELSE NULL END) AS record_dates
FROM orders
LEFT JOIN platform
ON orders.id = platform.id and platform.record_date <= orders.order_date
LEFT JOIN platform platform2
ON platform.platform_id = platform2.platform_id AND platform2.record_date <= orders.order_date
GROUP BY
order_id
, orders.id
, orders.order_date
, platform.platform_id
)
【讨论】:
【参考方案2】:您可以在WHERE
子句中使用子查询。子查询可以在未嵌套的数组上运行并返回一个布尔值 - 例如日期计数
SELECT c_id
, c.id
, c.c_date
, cxd.record_id
, CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.ids ELSE [c.id] END AS ids
, CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.record_dates ELSE NULL END AS record_dates
FROM c
LEFT JOIN record_ids cxd
ON c.id = cxd.id and cxd.record_date <= c.c_date
LEFT JOIN record_ids_agg rd_agg
ON cxd.record_id = rd_agg.record_id
WHERE (SELECT COUNT(1)>0 FROM UNNEST(record_dates) AS r WHERE r < '2019-10-15')
【讨论】:
以上是关于BigQuery LEFT JOIN 一个表并根据条件过滤其数组元素的主要内容,如果未能解决你的问题,请参考以下文章
当行没有匹配的 LEFT JOIN 时,BigQuery 正在创建一个 NULL 结构
LEFT OUTER JOIN 在 bigquery 上创建子查询时出错
BigQuery 未在 LEFT JOIN 中返回缺失的 NULL 行