按年/月统计和分组数据
Posted
技术标签:
【中文标题】按年/月统计和分组数据【英文标题】:Count and group data by Year/month 【发布时间】:2020-08-13 17:56:02 【问题描述】:我有一个 SQL Server 表,其中包含 12 个月前的任务数据。
这是一个例子:
我正在尝试编写一个查询,该查询显示每个月以及在该月之前按其评级打开的票数。下面的输出示例:
我创建了以下 SQL 语句来按天计数:
SELECT
created,
COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) AS high,
COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) AS med,
COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) AS low
FROM
taskDB
GROUP BY
created
ORDER BY
created ASC
我不确定如何按月分组并在该月之前获得正确的计数?有一个更好的方法吗?我的最终目标是将这些数据显示为时间线图,其中 yAxis 是票数,xAxis 是日期(年/月)。每个“评分”都会有一行。
2020 年 8 月 14 日更新
我在那里尝试了几个答案,他们似乎只计算每个月开放的门票数量,而不是每个月 + 之前所有月份的门票数量。我用一些测试数据创建了一个 SQL 脚本,这样每个人都可以看到我正在使用的内容:
GO
CREATE TABLE [dbo].[taskDB](
[ticket] [varchar](50) NULL,
[created] [date] NULL,
[closed] [date] NULL,
[rating] [varchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023345', CAST(N'2019-09-01' AS Date), CAST(N'2020-01-17' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023346', CAST(N'2019-08-01' AS Date), CAST(N'2019-08-03' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023347', CAST(N'2019-09-01' AS Date), CAST(N'2019-09-20' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023348', CAST(N'2019-08-01' AS Date), CAST(N'2020-08-06' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023349', CAST(N'2020-08-01' AS Date), CAST(N'2020-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023350', CAST(N'2019-08-01' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023351', CAST(N'2019-12-22' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023352', CAST(N'2019-11-07' AS Date), CAST(N'2020-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023353', CAST(N'2020-08-02' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023354', CAST(N'2019-08-02' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023355', CAST(N'2019-010-02' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023356', CAST(N'2019-08-02' AS Date), CAST(N'2019-08-05' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023357', CAST(N'2019-08-06' AS Date), CAST(N'2020-07-05' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023358', CAST(N'2019-10-04' AS Date), CAST(N'' AS Date), N'Low')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023359', CAST(N'2019-12-02' AS Date), CAST(N'2020-02-25' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023360', CAST(N'2019-08-05' AS Date), CAST(N'2019-08-05' AS Date), N'Medium')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023361', CAST(N'2020-08-02' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023362', CAST(N'2019-09-02' AS Date), CAST(N'2019-10-06' AS Date), N'Critical')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023363', CAST(N'2019-10-03' AS Date), CAST(N'2019-11-08' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023365', CAST(N'2019-10-03' AS Date), CAST(N'2019-12-08' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023364', CAST(N'2019-11-03' AS Date), CAST(N'2019-11-05' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023366', CAST(N'2020-06-03' AS Date), CAST(N'' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023368', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'High')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023367', CAST(N'2019-11-03' AS Date), CAST(N'' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023371', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'N/A')
GO
INSERT [dbo].[taskDB] ([ticket], [created], [closed], [rating]) VALUES (N'023370', CAST(N'2019-08-03' AS Date), CAST(N'2019-08-05' AS Date), N'Critical')
GO
我从@GMB 尝试了以下内容,结果很接近,但似乎没有给我正确的结果,因为有负数并且空白的封闭字段返回为 1900-01-01。
select
year(x.dt) yyyy,
month(x.dt) mm,
sum(sum(case when x.rating = 'low' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) low,
sum(sum(case when x.rating = 'medium' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) medium,
sum(sum(case when x.rating = 'high' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) high
from [TestDB].[dbo].[taskDB] t
cross apply (values
(rating, created, 1),
(rating, closed, -1)
) as x(rating, dt, cnt)
where x.dt is not null
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)
此查询的结果:
2020 年 8 月 14 日更新
@iceblade 修改后的查询在这一点上似乎是最正确的。唯一不考虑的是,如果一张票在同一个月内打开和关闭,我认为应该计算在内。这是查询:
declare @FromDate datetime,
@ToDate datetime;
SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);
declare @openTicketsByMonth table (firstDayNextMonth datetime, year int, month int, Low int, Medium int, High int, Critical int, NA int)
Insert into @openTicketsByMonth(firstDayNextMonth, year, month)
Select top (datediff(month, @FromDate, @ToDate) + 1)
dateadd(month, number + 1, @FromDate),
year(dateadd(month, number, @FromDate)),
month(dateadd(month, number, @FromDate))
from [master].dbo.spt_values
where [type] = N'P' order by number;
update R
Set R.Low = (Select count(1) from [dbo].[taskDB] where rating = 'Low' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
R.Medium = (Select count(1) from [dbo].[taskDB] where rating = 'Medium' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
R.High = (Select count(1) from [dbo].[taskDB] where rating = 'High' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
R.Critical = (Select count(1) from [dbo].[taskDB] where rating = 'Critical' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null)),
R.NA = (Select count(1) from [dbo].[taskDB] where rating = 'N/A' and created < R.firstDayNextMonth and (closed >= R.firstDayNextMonth or closed = '' or closed is null))
From @openTicketsByMonth R
select year,
month,
Low,
Medium,
High,
Critical,
NA
from @openTicketsByMonth
以及基于上述数据的查询输出:
如果您查看 2019/8 年度,则有 2 张重要票证在该月之后已打开并保持打开状态,但有 3 张重要票证在同一月打开和关闭。我认为这些都应该算在内。
2020 年 8 月 17 日更新
@iceblade 发布的查询已被编辑并确认产生正确的结果。答案已相应标记。
【问题讨论】:
请详细说明“我正在尝试编写一个查询,以显示每个月以及该月之前按其评级开放的门票数量。”。 所以让我们以上个月为例,如果我运行一个查询来显示所有打开的门票,那么我想显示所有打开的门票,按评级显示上个月的所有打开门票,依此类推,逐月返回. 【参考方案1】:一个选项使用横向连接和条件聚合:
select
year(x.dt) yyyy,
month(x.dt) mm,
sum(sum(case when x.rating = 'low' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) low,
sum(sum(case when x.rating = 'medium' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) medium,
sum(sum(case when x.rating = 'high' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)) high,
from taskDB t
cross apply (values
(rating, created, 1),
(rating, closed, -1)
) as x(rating, dt, cnt)
where x.dt is not null
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)
您可以通过将其转换为子查询并在外部查询中使用where
子句来根据需要在给定时间段进行过滤。
【讨论】:
我试过这个但我有两个问题,当“关闭”列为空或票尚未关闭时,SQL将其值设置为“1900-01-01”所以我认为是歪曲数据。此外,它似乎还没有计算到尚未关闭的月份的所有门票。 @mister.cake:我更改了查询,以便它处理空的关闭日期。这应该会给你想要的结果。 加上“其中 x.dt 不为空”,当封闭字段为“1900-01-01”时,似乎只是不计算那些日期。基本上我认为(评级,关闭,-1)只有在“关闭”中的日期不是“1900-01-01”时才会发生。 @mister.cake:在您显示的数据中,看起来closed
具有null
值-但我看不到您的真实数据。您可以将where
子句更改为最适合您的。
我在上面这篇文章的基础上添加了更多信息,并带有一个示例脚本。你可以看到我得到的输出,【参考方案2】:
您需要一个包含过去十二个月日期范围的日历表/视图,例如
"year/month" month_begin month_end
2019/08 2019-08-01 2019-08-31
然后是检查重叠日期范围的连接:
-- based on @iceblade's answer
declare @FromDate date,
@ToDate date;
SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);
declare @calendar table (Month_begin date, month_end date, year int, month int)
Insert into @calendar(Month_begin, month_end, year, month)
Select top (datediff(month, @FromDate, @ToDate) + 1)
dateadd(month, number, @FromDate),
dateadd(d,-1,dateadd(month, number + 1, @FromDate)),
year(dateadd(month, number, @FromDate)),
month(dateadd(month, number, @FromDate))
from [master].dbo.spt_values
where [type] = N'P' order by number;
select c.year,c.month,
COUNT(CASE WHEN rating = 'Low' THEN 1 ELSE NULL END) as low,
COUNT(CASE WHEN rating = 'Medium' THEN 1 ELSE NULL END) as med,
COUNT(CASE WHEN rating = 'High' THEN 1 ELSE NULL END) as high,
COUNT(CASE WHEN rating = 'Critical' THEN 1 ELSE NULL END) as critical,
COUNT(CASE WHEN rating = 'N/A' THEN 1 ELSE NULL END) as na
FROM taskDB as t join @calendar as c
-- overlapping periods
on t.created <= c.month_end
and (t.closed >= c.month_begin or t.closed is null)
GROUP BY c.year,c.month
ORDER BY c.year,c.month
添加基于 GMB 的变体,不加入日历,并且可能更有效地处理您的实际数据。这只是将截止日期修改为下个月:
select
year(x.dt) yyyy,
month(x.dt) mm,
sum(sum(case when x.rating = 'Low' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)
rows unbounded preceding) low,
sum(sum(case when x.rating = 'Medium' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)
rows unbounded preceding) medium,
sum(sum(case when x.rating = 'High' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)
rows unbounded preceding) high,
sum(sum(case when x.rating = 'Critical' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)
rows unbounded preceding) critical,
sum(sum(case when x.rating = 'N/A' then cnt else 0 end))
over(order by year(x.dt), month(x.dt)
rows unbounded preceding) na
from taskDB t
cross apply (values
(rating, created, 1),
-- closed in next month
(rating, dateadd(m,1,closed), -1)
) as x(rating, dt, cnt)
where dt <= getdate() -- no rows past today
group by year(x.dt), month(x.dt)
order by year(x.dt), month(x.dt)
有一个细微的差别,#2 会跳过几个月没有门票,但我怀疑这是否存在。
这似乎是你想要的,见fiddle
【讨论】:
我没有包括打开的行(= NULL 日期),因此结果是删除它们【参考方案3】:根据您的新输入,我创建了一个表变量,其中包含第一张票和最后一张票之间的所有年/月,我使用了这篇文章: better way to generate months/year table 然后我更新每个类别,计算关闭日期 > 每个月的第一天的票。这应该会给你想要的结果。
2020 年 8 月 17 日更新 - 修改后的查询以包含在月底之前关闭的工单。
declare @FromDate datetime,
@ToDate datetime;
SET @FromDate = (Select min(created) From [dbo].[taskDB]);
SET @ToDate = (Select max(created) From [dbo].[taskDB]);
declare @openTicketsByMonth table (firstDayOfMonth datetime, firstDayNextMonth datetime, year int, month int, Low int, Medium int, High int, Critical int, NA int)
Insert into @openTicketsByMonth(firstDayOfMonth, firstDayNextMonth, year, month)
Select top (datediff(month, @FromDate, @ToDate) + 1)
dateadd(month, number, @FromDate),
dateadd(month, number + 1, @FromDate),
year(dateadd(month, number, @FromDate)),
month(dateadd(month, number, @FromDate))
from [master].dbo.spt_values
where [type] = N'P' order by number;
update R
Set R.Low = (Select count(1) from [dbo].[taskDB] where rating = 'Low' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
R.Medium = (Select count(1) from [dbo].[taskDB] where rating = 'Medium' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
R.High = (Select count(1) from [dbo].[taskDB] where rating = 'High' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
R.Critical = (Select count(1) from [dbo].[taskDB] where rating = 'Critical' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null)),
R.NA = (Select count(1) from [dbo].[taskDB] where rating = 'N/A' and created < R.firstDayNextMonth and (closed >= R.firstDayOfMonth or closed = '' or closed is null))
From @openTicketsByMonth R
select year,
month,
Low,
Medium,
High,
Critical,
NA
from @openTicketsByMonth
【讨论】:
这是最近的一个!但根据门票数据,2020 年 8 月开盘的唯一门票是 3 个高点和 3 个低点。当我运行它时,它显示 2 个低点、1 个中点、2 个高点和 1 个临界点处于打开状态。看看你的计数逻辑,它似乎是正确的,但有些东西把它扔掉了。 我认为问题是比较的开始日期,我更新了查询添加了1个月,所以比较的是下个月第一天之前打开的票,还没有在该日期之前关闭。现在它向我展示了 2020 年 8 月:3 个高点,3 个低点,1 个不适用 很好,是的,这是可行的,但我不相信这个查询可以解释另一个问题。如果一张票在同一个月内打开和关闭,我觉得也应该算在内。 这取决于您要显示的内容,现在查询显示月底是否有未售票。我可以更新查询,以便显示当月是否有开票,但这会增加金额。例如,2020 年 8 月将是:低:3,中:2,高:3,关键:1,NA:1 刚刚更新了查询以显示在月底之前关闭的门票。【参考方案4】:按年和月而不是完整日期分组。
select convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2) as Created
, COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) as high,
COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) as med,
COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) as low FROM taskDB
GROUP BY convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2)
ORDER BY convert(varchar(max), year(created)) + '/' + right('0' + convert(varchar(max), month(created)),2) ASC
【讨论】:
这个查询似乎没有计算所有从每个月开始开放的票证,并说明票证是否在月/年之前关闭【参考方案5】:利用WIth子句将日期转换为所需 格式化,然后按该格式化日期分组。
With TempTaskDB As (
SELECT convert(varchar(20), datepart(year, created)) + '/' + convert(varchar(20), datepart(month, created)) as CreatedDate,rating
from taskDB)
Select CreatedDate,
COUNT(CASE WHEN rating = ‘high’ THEN 1 ELSE NULL END) AS high,
COUNT(CASE WHEN rating = ‘med’ THEN 1 ELSE NULL END) AS med,
COUNT(CASE WHEN rating = ‘low’ THEN 1 ELSE NULL END) AS low
from TempTaskDB
group by CreatedDate
Order by CreatedDate Asc
【讨论】:
以上是关于按年/月统计和分组数据的主要内容,如果未能解决你的问题,请参考以下文章
SQL: 一般情况按年分组,特殊年份按指定日期分组,SELECT语句怎么写?