复制记录组以填补 Google BigQuery 中的多个日期空白
Posted
技术标签:
【中文标题】复制记录组以填补 Google BigQuery 中的多个日期空白【英文标题】:Duplicate groups of records to fill multiple date gaps in Google BigQuery 【发布时间】:2017-02-07 18:57:56 【问题描述】:我发现了一个类似的问题 (Duplicating records to fill gap between dates in Google BigQuery),但是在不同的情况下,那里的答案不适用。
我有这样的数据结构(基本上是多个产品和合作伙伴的价格变化历史):
+------------+---------+---------+-------+
| date | product | partner | value |
+------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-01 | b | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-01 | b | y | 16 |
| 2017-01-05 | b | x | 13 |
| 2017-01-07 | a | y | 15 |
| 2017-01-07 | a | x | 15 |
+------------+---------+---------+-------+
我需要的是一个查询(专门用 BigQuery 标准 SQL 编写),给定一个日期范围(在本例中为 2017-01-01
到 2017-01-10
),输出以下结果:
+--------------+---------+---------+-------+
| date | product | partner | value |
+--------------+---------+---------+-------+
| 2017-01-01 | a | x | 10 |
| 2017-01-02 | a | x | 10 |
| 2017-01-03 | a | x | 10 |
| 2017-01-04 | a | x | 10 |
| 2017-01-05 | a | x | 10 |
| 2017-01-06 | a | x | 10 |
| 2017-01-07 | a | x | 15 |
| 2017-01-08 | a | x | 15 |
| 2017-01-09 | a | x | 15 |
| 2017-01-10 | a | x | 15 |
| 2017-01-01 | a | y | 11 |
| 2017-01-02 | a | y | 11 |
| 2017-01-03 | a | y | 11 |
| 2017-01-04 | a | y | 11 |
| 2017-01-05 | a | y | 11 |
| 2017-01-06 | a | y | 11 |
| 2017-01-07 | a | y | 15 |
| 2017-01-08 | a | y | 15 |
| 2017-01-09 | a | y | 15 |
| 2017-01-10 | a | y | 15 |
| 2017-01-01 | b | x | 15 |
| 2017-01-02 | b | x | 15 |
| 2017-01-03 | b | x | 15 |
| 2017-01-04 | b | x | 15 |
| 2017-01-05 | b | x | 13 |
| 2017-01-06 | b | x | 13 |
| 2017-01-07 | b | x | 13 |
| 2017-01-08 | b | x | 13 |
| 2017-01-09 | b | x | 13 |
| 2017-01-10 | b | x | 13 |
| 2017-01-01 | b | y | 16 |
| 2017-01-02 | b | y | 16 |
| 2017-01-03 | b | y | 16 |
| 2017-01-04 | b | y | 16 |
| 2017-01-05 | b | y | 16 |
| 2017-01-06 | b | y | 16 |
| 2017-01-07 | b | y | 16 |
| 2017-01-08 | b | y | 16 |
| 2017-01-09 | b | y | 16 |
| 2017-01-10 | b | y | 16 |
+--------------+---------+---------+-------+
对于产品和合作伙伴的每种组合,基本上都是一个价格历史记录,其中包含所有日期空白。
我很难弄清楚如何完成这项工作,尤其是如何在没有发生价格变化的同一日期生成多行。有什么想法吗?
【问题讨论】:
到目前为止您尝试过什么?从您的问题中,我不明白您将如何在每个日期得到多行。是什么决定的? @ElliottBrossard 前提是我需要每天重复产品和合作伙伴的每个组合的最新值,并且它们的值没有改变。因此,如果我有两个产品和两个合作伙伴,那么在没有值变化的日子里应该有 4 行具有最新值 【参考方案1】:下面试试
#standardSQL
WITH history AS (
SELECT '2017-01-01' AS d, 'a' AS product, 'x' AS partner, 10 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'x' AS partner, 15 AS value UNION ALL
SELECT '2017-01-01' AS d, 'a' AS product, 'y' AS partner, 11 AS value UNION ALL
SELECT '2017-01-01' AS d, 'b' AS product, 'y' AS partner, 16 AS value UNION ALL
SELECT '2017-01-05' AS d, 'b' AS product, 'x' AS partner, 13 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'y' AS partner, 15 AS value UNION ALL
SELECT '2017-01-07' AS d, 'a' AS product, 'x' AS partner, 15 AS value
),
daterange AS (
SELECT date_in_range
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-10')) AS date_in_range
),
temp AS (
SELECT d, product, partner, value, LEAD(d) OVER(PARTITION BY product, partner ORDER BY d) AS next_d
FROM history
ORDER BY product, partner, d
)
SELECT date_in_range, product, partner, value
FROM daterange
JOIN temp
ON daterange.date_in_range >= PARSE_DATE('%Y-%m-%d', temp.d)
AND (daterange.date_in_range < PARSE_DATE('%Y-%m-%d', temp.next_d) OR temp.next_d IS NULL)
ORDER BY product, partner, date_in_range
【讨论】:
以上是关于复制记录组以填补 Google BigQuery 中的多个日期空白的主要内容,如果未能解决你的问题,请参考以下文章
将数据从 MySQL 复制到 Google BigQuery