按日期和组聚合并在大查询中填写缺失的日期

Posted

技术标签:

【中文标题】按日期和组聚合并在大查询中填写缺失的日期【英文标题】:Aggregate by date and group and fill in missing dates in big query 【发布时间】:2020-01-07 22:27:07 【问题描述】:

我有一个带有 ID、开始日期和结束日期的表格。

ID  start_date  end_date
1   01/01/2014  06/01/2014
2   10/01/2005  12/01/2015
3   08/01/2009  10/01/2012
...

我有另一个表格,每个事件都按 ID 和日期(截断为月份)。

ID  month_year   amount
1   02/01/2014   100
1   03/01/2007   25
2   010/01/2010  50
...

我希望能够通过 ID 获得每月总计,并重新添加零个月。

ID  month_year   amount
1   02/01/2007   100
1   03/01/2007   0
2   04/01/2007   0
...

抱歉,如果这是一个基本问题。它非常接近this question 和this question,但由于数据集的大小,重要的是只为 start_date 和 end_date 之间的每个 ID 填充零。它也像enter link description here,但我无法将其转化为大查询。我的代码现在已关闭,但类似于:

WITH data AS(
            SELECT * FROM `Table 11` AS t0
), all_months AS (
   SELECT month
   FROM UNNEST(GENERATE_DATE_ARRAY(
     (SELECT MIN(start_date) FROM data)
     , (SELECT MAX(end_date) FROM data)
     , INTERVAL 1 MONTH)
   ) AS month
)

SELECT DISTINCT ID month_year, 
    SUM(Amount) OVER (PARTITION BY ID, month_year) AS sum_amount,   
FROM data AS t0
LEFT JOIN all_months AS t1
ON t0.month_year=t1.month

【问题讨论】:

【参考方案1】:

以下示例适用于 BigQuery 标准 SQL

#standardSQL
WITH `project.dataset.data` AS (
  SELECT 1 id, '01/01/2014' start_date, '06/01/2014' end_date UNION ALL
  SELECT 2, '10/01/2005', '12/01/2015' UNION ALL
  SELECT 3, '08/01/2009', '10/01/2012' 
), `project.dataset.amounts` AS (
  SELECT 1 id, '02/01/2014' month_year, 100 amount UNION ALL
  SELECT 1, '03/01/2007', 25 UNION ALL
  SELECT 2, '10/01/2010', 50 
), all_months AS (
  SELECT id, FORMAT_DATE('%m/%d/%Y', month_year) month_year
  FROM `project.dataset.data`,
  UNNEST(GENERATE_DATE_ARRAY(PARSE_DATE('%m/%d/%Y', start_date), PARSE_DATE('%m/%d/%Y', end_date), INTERVAL 1 MONTH)) month_year
)
SELECT id, month_year, SUM(IFNULL(amount, 0)) amount
FROM all_months m
LEFT JOIN `project.dataset.amounts` a
USING (id, month_year)
GROUP BY id, month_year   

【讨论】:

以上是关于按日期和组聚合并在大查询中填写缺失的日期的主要内容,如果未能解决你的问题,请参考以下文章

GroupBy 聚合,包括 mongo 中的缺失日期

在大查询中进行滚动聚合的更好方法?

在 MongoDB 聚合中添加缺失的日期

SQL查询按时间段到日期聚合

同时从四个表中获取按日期分组的聚合值

Django - 按日期和平均值查询聚合