BigQuery LEFT JOIN 是加倍值

Posted

技术标签:

【中文标题】BigQuery LEFT JOIN 是加倍值【英文标题】:BigQuery LEFT JOIN is doubling-up values 【发布时间】:2018-07-10 19:29:14 【问题描述】:

我正在尝试合并两个数据集——一个是销售目标,另一个是实际销售额,按天和市场(美国/英国)。

为此,我使用了第三个表,该表使用GENERATE_DATE_ARRAY 创建要报告的日期的主列表 - 这样我就不会在没有设定目标和没有报告销售的情况下出现空白.

我发现我的销售额被计算了两次,因此已将我的数据和查询减少到可重现的状态:

#standardSQL

WITH dates AS (
  SELECT day FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2018-07-05', '2018-07-09', INTERVAL 1 DAY)) AS day
),
targets AS (
  SELECT DATE '2018-07-06' AS day, 'UK' AS Market, NUMERIC '2.4' AS quantity
  UNION ALL SELECT '2018-07-06', "US", 8.4
  UNION ALL SELECT '2018-07-06', "US", 1.2
  UNION ALL SELECT '2018-07-08', "UK", 3.0
  UNION ALL SELECT '2018-07-08', "US", 10.9
),
sales AS (
  SELECT DATE '2018-07-08' AS day, 'UK' AS Market, 4 AS quantity
  UNION ALL SELECT '2018-07-06', 'US', 15
)

SELECT 
  dates.day AS day,
  targets.market AS market,
  SUM(targets.quantity) AS targetQuantity,
  SUM(sales.quantity) AS quantity
FROM dates
LEFT JOIN targets
  ON dates.day = CAST(targets.day AS DATE)
LEFT JOIN sales
  ON dates.day = CAST(sales.day AS DATE) AND targets.market = sales.market
GROUP BY day, market
ORDER BY day, market

这给出了以下结果:

结果显示,7 月 6 日(第 3 行)报告的销售量为 30,尽管数据中为 15。

targets 数据中有两行该日期和市场时,就会发生这种情况,但我不知道如何为此编码。

感谢您的帮助!

【问题讨论】:

【参考方案1】:

下面应该工作。这个想法是预先聚合销售和目标表以避免重复

#standardSQL
WITH dates AS (
  SELECT day FROM UNNEST(GENERATE_DATE_ARRAY(DATE '2018-07-05', '2018-07-09', INTERVAL 1 DAY)) AS day
), targets AS (
  SELECT DATE '2018-07-06' AS day, 'UK' AS Market, NUMERIC '2.4' AS quantity
  UNION ALL SELECT '2018-07-06', "US", 8.4
  UNION ALL SELECT '2018-07-06', "US", 1.2
  UNION ALL SELECT '2018-07-08', "UK", 3.0
  UNION ALL SELECT '2018-07-08', "US", 10.9
), sales AS (
  SELECT DATE '2018-07-08' AS day, 'UK' AS Market, 4 AS quantity
  UNION ALL SELECT '2018-07-06', 'US', 15
)
SELECT 
  dates.day AS day,
  t.market AS market,
  targetQuantity,
  quantity
FROM dates 
LEFT JOIN (SELECT day, market, SUM(quantity) AS targetQuantity FROM targets GROUP BY day, market) t
  ON dates.day = CAST(t.day AS DATE)
LEFT JOIN (SELECT day, market, SUM(quantity) AS quantity FROM sales GROUP BY day, market) s
  ON dates.day = CAST(s.day AS DATE) AND t.market = s.market
ORDER BY day, market

【讨论】:

以上是关于BigQuery LEFT JOIN 是加倍值的主要内容,如果未能解决你的问题,请参考以下文章

BigQuery 未在 LEFT JOIN 中返回缺失的 NULL 行

带有 UNNEST、LEFT JOIN 和 WHERE 语句的 Bigquery

Bigquery:按 _PARTITIONTIME 过滤不会在 LEFT JOIN 上传播

LEFT OUTER JOIN 在 bigquery 上创建子查询时出错

当行没有匹配的 LEFT JOIN 时,BigQuery 正在创建一个 NULL 结构

BigQuery LEFT JOIN 一个表并根据条件过滤其数组元素