MySQL 按日期和计数分组,包括丢失的日期

Posted

技术标签:

【中文标题】MySQL 按日期和计数分组,包括丢失的日期【英文标题】:MySQL group by date and count including missing dates 【发布时间】:2014-11-06 09:51:56 【问题描述】:

以前我正在执行以下操作以从报告表中获取每日计数。

SELECT COUNT(*) AS count_all, tracked_on
 FROM `reports`
 WHERE (domain_id = 939 AND tracked_on >= '2014-01-01' AND tracked_on <= '2014-12-31')
 GROUP BY tracked_on
 ORDER BY tracked_on ASC;

显然,这不会给我错过日期的 0 计数。

然后我终于找到了一个optimum solution 来生成给定日期范围之间的日期系列。 但我面临的下一个挑战是将它与我​​的报告表结合起来,并按日期对计数进行分组。

select count(*), all_dates.Date as the_date, domain_id
from (
    select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
    from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) all_dates
inner JOIN reports r
    on all_dates.Date >= '2014-01-01'
  and all_dates.Date <= '2014-12-31'
where all_dates.Date between '2014-01-01' and '2014-12-31' AND domain_id = 939 GROUP BY the_date order by the_date ASC ;

得到的结果是

count(*)    the_date    domain_id
46  2014-01-01  939
46  2014-01-02  939
46  2014-01-03  939
46  2014-01-04  939
46  2014-01-05  939
46  2014-01-06  939
46  2014-01-07  939
46  2014-01-08  939
46  2014-01-09  939
46  2014-01-10  939
46  2014-01-11  939
46  2014-01-12  939
46  2014-01-13  939
46  2014-01-14  939
...


而我希望用 0 填写缺失的日期

类似

count(*)    the_date    domain_id
12  2014-01-01  939
23  2014-01-02  939
46  2014-01-03  939
0   2014-01-04  939
0   2014-01-05  939
99  2014-01-06  939
1   2014-01-07  939
5   2014-01-08  939
...


我给的另一个尝试是:
select count(*), all_dates.Date as the_date, domain_id
from (
    select curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date
    from (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
    cross join (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as c
) all_dates
inner JOIN reports r
    on all_dates.Date = r.tracked_on
where all_dates.Date between '2014-01-01' and '2014-12-31' AND domain_id = 939 GROUP BY the_date order by the_date ASC ;

结果:

count(*)    the_date    domain_id
38        2014-09-03     939
8         2014-09-04     939

上述查询的最小数据:http://sqlfiddle.com/#!2/dee3e/6

【问题讨论】:

如果您愿意,请考虑遵循以下简单的两步操作: 1. 如果您还没有这样做,请提供适当的 DDL(和/或 sqlfiddle),以便我们可以更轻松地复制问题。 2. 如果您尚未这样做,请提供与步骤 1 中提供的信息相对应的所需结果集。 当然,sqlfiddle.com/#!2/dee3e/6 这是包含行的最小表格。 这只是一个建议,您不必遵循它。 【参考方案1】:

您需要OUTER JOIN 才能在开始和结束之间的每一天到达,因为如果您使用INNER JOIN,它会将输出限制为仅连接的日期(即仅报告表中的那些日期) .

此外,当您使用OUTER JOIN 时,您必须注意where clause 中的条件不会导致implicit inner join;例如 AND domain_id = 1 如果在 where 子句中使用会抑制任何不满足该条件的行,但当用作连接条件时,它只会限制报表表的行。 p>

SELECT
      COUNT(r.domain_id)
    , all_dates.Date AS the_date
    , domain_id
FROM (
        SELECT DATE_ADD(curdate(), INTERVAL 2 MONTH) - INTERVAL (a.a + (10 * b.a) ) DAY as Date
        FROM (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
        CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
      ) all_dates
      LEFT OUTER JOIN reports r
                  ON all_dates.Date = r.tracked_on
                        AND domain_id = 1
WHERE all_dates.Date BETWEEN '2014-09-01' AND '2014-09-30'
GROUP BY
      the_date
ORDER BY
      the_date ASC;

我还更改了 all_dates 派生表,使用DATE_ADD() 将起点推到未来,并且我已经减小了它的大小。这两个都是选项,可以根据需要进行调整。

Demo at SQLfiddle


要获得每一行的 domain_id (如您的问题所示),您需要使用以下内容;请注意,您可以使用特定于 mysqlIFNULL(),但我使用了更通用的 SQL 的 COALESCE()。但是,这里显示的 @parameter 的使用无论如何都是 MySQL 特定的。

SET @domain := 1;

SELECT
      COUNT(r.domain_id)
    , all_dates.Date AS the_date
    , coalesce(domain_id,@domain) AS domain_id
FROM (
        SELECT DATE_ADD(curdate(), INTERVAL 2 month) - INTERVAL (a.a + (10 * b.a) ) DAY as Date
        FROM (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as a
        CROSS JOIN (select 0 as a union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) as b
      ) all_dates
      LEFT JOIN reports r
                  ON all_dates.Date = r.tracked_on
                        AND domain_id = @domain
WHERE all_dates.Date BETWEEN '2014-09-01' AND '2014-09-30'
GROUP BY
      the_date
ORDER BY
      the_date ASC;

See this at SQLfiddle

【讨论】:

太棒了!刚刚工作:) 是否有可能也有一组按周和月分组的日子?与我们拥有的所有“日子”类似,我可以在一年中拥有所有的星期。 很高兴这是您的回答 - 请花一秒钟点击勾选标记 - 这表示答案已被接受。可以使用较大的时间单位,例如周、月、年,但我们通常使用日期范围(从/到日期对)。【参考方案2】:

all_dates 子查询仅从当天 (curdate()) 回顾。如果您想包含未来的日期,请将子查询的第一行更改为:

select '2015-01-01' - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY as Date

【讨论】:

以上是关于MySQL 按日期和计数分组,包括丢失的日期的主要内容,如果未能解决你的问题,请参考以下文章

按日期分组和计数

按日期分组和计数 (R)

在熊猫数据框中按日期和计数值分组

特定日期之间的Mysql计数和分组

按日期间隔大于 X 的 DATETIME 获取数据、计数和分组

MySQL按天计数和分组