按日期和组聚合并在大查询中填写缺失的日期
Posted
技术标签:
【中文标题】按日期和组聚合并在大查询中填写缺失的日期【英文标题】:Aggregate by date and group and fill in missing dates in big query 【发布时间】:2020-01-07 22:27:07 【问题描述】:我有一个带有 ID、开始日期和结束日期的表格。
ID start_date end_date
1 01/01/2014 06/01/2014
2 10/01/2005 12/01/2015
3 08/01/2009 10/01/2012
...
我有另一个表格,每个事件都按 ID 和日期(截断为月份)。
ID month_year amount
1 02/01/2014 100
1 03/01/2007 25
2 010/01/2010 50
...
我希望能够通过 ID 获得每月总计,并重新添加零个月。
ID month_year amount
1 02/01/2007 100
1 03/01/2007 0
2 04/01/2007 0
...
抱歉,如果这是一个基本问题。它非常接近this question 和this question,但由于数据集的大小,重要的是只为 start_date 和 end_date 之间的每个 ID 填充零。它也像enter link description here,但我无法将其转化为大查询。我的代码现在已关闭,但类似于:
WITH data AS(
SELECT * FROM `Table 11` AS t0
), all_months AS (
SELECT month
FROM UNNEST(GENERATE_DATE_ARRAY(
(SELECT MIN(start_date) FROM data)
, (SELECT MAX(end_date) FROM data)
, INTERVAL 1 MONTH)
) AS month
)
SELECT DISTINCT ID month_year,
SUM(Amount) OVER (PARTITION BY ID, month_year) AS sum_amount,
FROM data AS t0
LEFT JOIN all_months AS t1
ON t0.month_year=t1.month
【问题讨论】:
【参考方案1】:以下示例适用于 BigQuery 标准 SQL
#standardSQL
WITH `project.dataset.data` AS (
SELECT 1 id, '01/01/2014' start_date, '06/01/2014' end_date UNION ALL
SELECT 2, '10/01/2005', '12/01/2015' UNION ALL
SELECT 3, '08/01/2009', '10/01/2012'
), `project.dataset.amounts` AS (
SELECT 1 id, '02/01/2014' month_year, 100 amount UNION ALL
SELECT 1, '03/01/2007', 25 UNION ALL
SELECT 2, '10/01/2010', 50
), all_months AS (
SELECT id, FORMAT_DATE('%m/%d/%Y', month_year) month_year
FROM `project.dataset.data`,
UNNEST(GENERATE_DATE_ARRAY(PARSE_DATE('%m/%d/%Y', start_date), PARSE_DATE('%m/%d/%Y', end_date), INTERVAL 1 MONTH)) month_year
)
SELECT id, month_year, SUM(IFNULL(amount, 0)) amount
FROM all_months m
LEFT JOIN `project.dataset.amounts` a
USING (id, month_year)
GROUP BY id, month_year
【讨论】:
以上是关于按日期和组聚合并在大查询中填写缺失的日期的主要内容,如果未能解决你的问题,请参考以下文章