按年/月统计和分组数据

Posted

技术标签:

【中文标题】按年/月统计和分组数据【英文标题】:Count and group data by Year/month 【发布时间】:2020-08-13 17:56:02 【问题描述】:

我有一个 SQL Server 表,其中包含 12 个月前的任务数据。

这是一个例子:

我正在尝试编写一个查询,该查询显示每个月以及在该月之前按其评级打开的票数。下面的输出示例:

我创建了以下 SQL 语句来按天计数:

SELECT 
    created, 
    COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) AS high,
    COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) AS med,
    COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) AS low 
FROM 
    taskDB 
GROUP BY 
    created 
ORDER BY 
    created ASC

我不确定如何按月分组并在该月之前获得正确的计数?有一个更好的方法吗?我的最终目标是将这些数据显示为时间线图,其中 yAxis 是票数,xAxis 是日期(年/月)。每个“评分”都会有一行。

2020 年 8 月 14 日更新

我在那里尝试了几个答案,他们似乎只计算每个月开放的门票数量,而不是每个月 + 之前所有月份的门票数量。我用一些测试数据创建了一个 SQL 脚本,这样每个人都可以看到我正在使用的内容:

GO
CREATE TABLE [dbo].[taskDB](
    [ticket] [varchar](50) NULL,
    [created] [date] NULL,
    [closed] [date] NULL,
    [rating] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023345', CAST(N'2019-09-01' AS Date), CAST(N'2020-01-17' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023346', CAST(N'2019-08-01' AS Date), CAST(N'2019-08-03' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023347', CAST(N'2019-09-01' AS Date), CAST(N'2019-09-20' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023348', CAST(N'2019-08-01' AS Date), CAST(N'2020-08-06' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023349', CAST(N'2020-08-01' AS Date), CAST(N'2020-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023350', CAST(N'2019-08-01' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023351', CAST(N'2019-12-22' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023352', CAST(N'2019-11-07' AS Date), CAST(N'2020-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023353', CAST(N'2020-08-02' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023354', CAST(N'2019-08-02' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023355', CAST(N'2019-010-02' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023356', CAST(N'2019-08-02' AS Date), CAST(N'2019-08-05' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023357', CAST(N'2019-08-06' AS Date), CAST(N'2020-07-05' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023358', CAST(N'2019-10-04' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023359', CAST(N'2019-12-02' AS Date), CAST(N'2020-02-25' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023360', CAST(N'2019-08-05' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023361', CAST(N'2020-08-02' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023362', CAST(N'2019-09-02' AS Date), CAST(N'2019-10-06' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023363', CAST(N'2019-10-03' AS Date), CAST(N'2019-11-08' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023365', CAST(N'2019-10-03' AS Date), CAST(N'2019-12-08' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023364', CAST(N'2019-11-03' AS Date), CAST(N'2019-11-05' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023366', CAST(N'2020-06-03' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023368', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023367', CAST(N'2019-11-03' AS Date), CAST(N'' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023371', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023370', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'Critical')
GO

我从@GMB 尝试了以下内容,结果很接近,但似乎没有给我正确的结果,因为有负数并且空白的封闭字段返回为 1900-01-01。

select 
    year(x.dt) yyyy,
    month(x.dt) mm,
    sum(sum(case when x.rating = 'low'    then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) low,
    sum(sum(case when x.rating = 'medium' then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) medium,
    sum(sum(case when x.rating = 'high'   then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) high
from [TestDB].[dbo].[taskDB] t
cross apply (values
    (rating, created, 1),
    (rating, closed, -1)
) as x(rating, dt, cnt)
where x.dt is not null
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)

此查询的结果:

2020 年 8 月 14 日更新

@iceblade 修改后的查询在这一点上似乎是最正确的。唯一不考虑的是,如果一张票在同一个月内打开和关闭,我认为应该计算在内。这是查询:

declare @FromDate datetime, 
        @ToDate datetime;

SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);

declare @openTicketsByMonth table (firstDayNextMonth datetime, year int, month int, Low int, Medium int, High int, Critical int, NA int)

Insert into @openTicketsByMonth(firstDayNextMonth, year, month)

Select top  (datediff(month, @FromDate, @ToDate) + 1) 
              dateadd(month, number + 1, @FromDate),
              year(dateadd(month, number, @FromDate)),
              month(dateadd(month, number, @FromDate))
              from [master].dbo.spt_values 
              where [type] = N'P' order by number;

update R
Set R.Low = (Select count(1) from [dbo].[taskDB] where rating = 'Low' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
    R.Medium = (Select count(1) from [dbo].[taskDB] where rating = 'Medium' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
    R.High = (Select count(1) from [dbo].[taskDB] where rating = 'High' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
    R.Critical = (Select count(1) from [dbo].[taskDB] where rating = 'Critical' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
    R.NA = (Select count(1) from [dbo].[taskDB] where rating = 'N/A' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null))
From @openTicketsByMonth R

select  year,
        month,
        Low,
        Medium,
        High,
        Critical,
        NA 
from @openTicketsByMonth

以及基于上述数据的查询输出:

如果您查看 2019/8 年度,则有 2 张重要票证在该月之后已打开并保持打开状态,但有 3 张重要票证在同一月打开和关闭。我认为这些都应该算在内。

2020 年 8 月 17 日更新

@iceblade 发布的查询已被编辑并确认产生正确的结果。答案已相应标记。

【问题讨论】:

请详细说明“我正在尝试编写一个查询,以显示每个月以及该月之前按其评级开放的门票数量。”。 所以让我们以上个月为例,如果我运行一个查询来显示所有打开的门票,那么我想显示所有打开的门票,按评级显示上个月的所有打开门票,依此类推,逐月返回. 【参考方案1】:

一个选项使用横向连接和条件聚合:

select 
    year(x.dt) yyyy,
    month(x.dt) mm,
    sum(sum(case when x.rating = 'low'    then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) low,
    sum(sum(case when x.rating = 'medium' then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) medium,
    sum(sum(case when x.rating = 'high'   then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)) high,
from taskDB t
cross apply (values
    (rating, created, 1),
    (rating, closed, -1)
) as x(rating, dt, cnt)
where x.dt is not null
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)

您可以通过将其转换为子查询并在外部查询中使用where 子句来根据需要在给定时间段进行过滤。

【讨论】:

我试过这个但我有两个问题,当“关闭”列为空或票尚未关闭时,SQL将其值设置为“1900-01-01”所以我认为是歪曲数据。此外,它似乎还没有计算到尚未关闭的月份的所有门票。 @mister.cake:我更改了查询,以便它处理空的关闭日期。这应该会给你想要的结果。 加上“其中 x.dt 不为空”,当封闭字段为“1900-01-01”时,似乎只是不计算那些日期。基本上我认为(评级,关闭,-1)只有在“关闭”中的日期不是“1900-01-01”时才会发生。 @mister.cake:在您显示的数据中,看起来closed 具有null 值-但我看不到您的真实数据。您可以将where 子句更改为最适合您的。 我在上面这篇文章的基础上添加了更多信息,并带有一个示例脚本。你可以看到我得到的输出,【参考方案2】:

您需要一个包含过去十二个月日期范围的日历表/视图,例如

"year/month"  month_begin    month_end
     2019/08   2019-08-01   2019-08-31

然后是检查重叠日期范围的连接:

-- based on @iceblade's answer
declare @FromDate date, 
        @ToDate date;

SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);

declare @calendar table (Month_begin date, month_end date, year int, month int)

Insert into @calendar(Month_begin, month_end, year, month)

Select top  (datediff(month, @FromDate, @ToDate) + 1) 
              dateadd(month, number, @FromDate),
              dateadd(d,-1,dateadd(month, number + 1, @FromDate)),
              year(dateadd(month, number, @FromDate)),
              month(dateadd(month, number, @FromDate))
              from [master].dbo.spt_values 
              where [type] = N'P' order by number;
              
              
select c.year,c.month,
   COUNT(CASE WHEN rating = 'Low' THEN 1 ELSE NULL END) as low,
   COUNT(CASE WHEN rating = 'Medium' THEN 1 ELSE NULL END) as med,
   COUNT(CASE WHEN rating = 'High' THEN 1 ELSE NULL END) as high,
   COUNT(CASE WHEN rating = 'Critical' THEN 1 ELSE NULL END) as critical,
   COUNT(CASE WHEN rating = 'N/A' THEN 1 ELSE NULL END) as na
FROM taskDB as t join @calendar as c 
  -- overlapping periods
  on t.created <= c.month_end
 and (t.closed >= c.month_begin  or t.closed is null)
GROUP BY c.year,c.month
ORDER BY c.year,c.month

添加基于 GMB 的变体,不加入日历,并且可能更有效地处理您的实际数据。这只是将截止日期修改为下个月:

select 
    year(x.dt) yyyy,
    month(x.dt) mm,
    sum(sum(case when x.rating = 'Low'    then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)
             rows unbounded preceding) low,
    sum(sum(case when x.rating = 'Medium' then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)
             rows unbounded preceding) medium,
    sum(sum(case when x.rating = 'High'   then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)
             rows unbounded preceding) high,
    sum(sum(case when x.rating = 'Critical'   then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)
             rows unbounded preceding) critical,
    sum(sum(case when x.rating = 'N/A'   then cnt else 0 end)) 
        over(order by year(x.dt), month(x.dt)
             rows unbounded preceding) na
from taskDB t
cross apply (values
    (rating, created, 1),
    -- closed in next month
    (rating, dateadd(m,1,closed), -1)
   ) as x(rating, dt, cnt)
where dt <= getdate() -- no rows past today
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)

有一个细微的差别,#2 会跳过几个月没有门票,但我怀疑这是否存在。

这似乎是你想要的,见fiddle

【讨论】:

我没有包括打开的行(= NULL 日期),因此结果是删除它们【参考方案3】:

根据您的新输入,我创建了一个表变量,其中包含第一张票和最后一张票之间的所有年/月,我使用了这篇文章: better way to generate months/year table 然后我更新每个类别,计算关闭日期 > 每个月的第一天的票。这应该会给你想要的结果。

2020 年 8 月 17 日更新 - 修改后的查询以包含在月底之前关闭的工单。

declare @FromDate datetime, 
        @ToDate datetime;

SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);

declare @openTicketsByMonth table (firstDayOfMonth datetime, firstDayNextMonth datetime, year int, month int, Low int, Medium int, High int, Critical int, NA int)

Insert into @openTicketsByMonth(firstDayOfMonth, firstDayNextMonth, year, month)

Select top  (datediff(month, @FromDate, @ToDate) + 1) 
                                                  dateadd(month, number, @FromDate),
              dateadd(month, number + 1, @FromDate),
             year(dateadd(month, number, @FromDate)),
              month(dateadd(month, number, @FromDate))
              from [master].dbo.spt_values 
              where [type] = N'P' order by number;

update R
Set R.Low = (Select count(1) from [dbo].[taskDB] where rating = 'Low' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
    R.Medium = (Select count(1) from [dbo].[taskDB] where rating = 'Medium' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
    R.High = (Select count(1) from [dbo].[taskDB] where rating = 'High' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
    R.Critical = (Select count(1) from [dbo].[taskDB] where rating = 'Critical' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
    R.NA = (Select count(1) from [dbo].[taskDB] where rating = 'N/A' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null))
From @openTicketsByMonth R

select  year,
        month,
        Low,
        Medium,
        High,
        Critical,
        NA 
from @openTicketsByMonth

【讨论】:

这是最近的一个!但根据门票数据,2020 年 8 月开盘的唯一门票是 3 个高点和 3 个低点。当我运行它时,它显示 2 个低点、1 个中点、2 个高点和 1 个临界点处于打开状态。看看你的计数逻辑,它似乎是正确的,但有些东西把它扔掉了。 我认为问题是比较的开始日期,我更新了查询添加了1个月,所以比较的是下个月第一天之前打开的票,还没有在该日期之前关闭。现在它向我展示了 2020 年 8 月:3 个高点,3 个低点,1 个不适用 很好,是的,这是可行的,但我不相信这个查询可以解释另一个问题。如果一张票在同一个月内打开和关闭,我觉得也应该算在内。 这取决于您要显示的内容,现在查询显示月底是否有未售票。我可以更新查询,以便显示当月是否有开票,但这会增加金额。例如,2020 年 8 月将是:低:3,中:2,高:3,关键:1,NA:1 刚刚更新了查询以显示在月底之前关闭的门票。【参考方案4】:

按年和月而不是完整日期分组。

   select convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2) as Created
       , COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) as high,
    COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) as med,
    COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) as low FROM taskDB 
    GROUP BY convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2)
ORDER BY convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2) ASC

【讨论】:

这个查询似乎没有计算所有从每个月开始开放的票证,并说明票证是否在月/年之前关闭【参考方案5】:

利用WIth子句将日期转换为所需 格式化,然后按该格式化日期分组。

With TempTaskDB As (    
SELECT convert(varchar(20), datepart(year, created)) + '/' + convert(varchar(20), datepart(month, created)) as CreatedDate,rating
from taskDB)
Select CreatedDate,
COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) AS high,
    COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) AS med,
    COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) AS low
from TempTaskDB
group by CreatedDate
Order by CreatedDate Asc

【讨论】:

以上是关于按年/月统计和分组数据的主要内容,如果未能解决你的问题,请参考以下文章

按年季度月分组&&计算日期和时间的函数

SQL: 一般情况按年分组,特殊年份按指定日期分组,SELECT语句怎么写?

c# List<T> 按年分组,然后按月分组

MySQL按年/月/周/日/小时分组查询排序limit判空用法

如何简化 Laravel 查询以按年搜索日期并按月分组

按年和月分组并获得一个月的最小值,日期