查找最大同时出现次数,如果满足条件则增加变量
Posted
技术标签:
【中文标题】查找最大同时出现次数,如果满足条件则增加变量【英文标题】:Find max simultanius occurences, increasse variable if criteria is meet 【发布时间】:2021-06-19 00:46:47 【问题描述】:我有一项任务,旨在将进入给定端口(中继)集的几个连接汇总到时隙中并计算结果。问题是我想实现两种类型的计数:
一个计数应计算给定插槽中与Trunk
的连接总数(每个插槽 30 分钟)
其次,我想找出同时发生的最大连接数:例如总共有 10 个连接,但其中只有 3 个同时连接。表中的 cmets 表示“计数组”
表:RAW_DATA
GatewayName StartDateTime DisconnectDateTime ConDur Trunk
GW1 2021-02-24 20:01:00.0 2021-02-24 20:05:30.0 270000 T1 --1, nextRow.Start is before discon
GW1 2021-02-24 20:04:50.0 2021-02-24 20:08:24.0 214000 T1 --2
GW1 2021-02-24 20:05:20.6 2021-02-24 20:07:50.1 149500 T1 --3
GW1 2021-02-24 20:15:50.0 2021-02-24 20:17:00.0 70000 T1 --0
GW1 2021-02-24 20:20:50.0 2021-02-24 20:21:00.0 10000 T1 --1
GW1 2021-02-24 20:20:59.0 2021-02-24 20:24:00.0 181000 T1 --2
GW1 2021-02-24 20:23:59.0 2021-02-24 20:28:30.0 271000 T1 --3
GW1 2021-02-24 20:26:00.0 2021-02-24 20:29:30.0 210000 T1 --4
GW1 2021-02-24 20:27:00.0 2021-02-24 20:29:31.0 151000 T1 --5
GW3 2021-02-24 22:46:54.2 2021-02-24 22:48:25.2 91000 T1 --0
GW2 2021-02-24 20:41:49.0 2021-02-24 20:43:24.0 95000 T2 --0
GW99 2021-02-24 22:47:25.1 2021-02-24 22:47:54.4 29300 T2 --0
目前的结果
我正在运行一个存储过程,该过程创建一个计数表,我用它来生成我的时间段。
另外,我创建了一个在运行时间段排序之前运行的临时表,该表的目的是查看下一行 StartDateTime
以查看它是否在当前行 DisconnectDateTime
之前。该表是作为测试表构建的,以查看是否可以在遇到零 (0) 之前以某种方式对事件进行计数和分组,以便稍后执行 max
以一次获得最大数量的事件。不幸的是,我无法弄清楚这个机制。
GatewayName StartDateTime DisconnectDateTime ConDur Trunk nrDDT sim
GW1 2021-02-24 20:01:00.0 2021-02-24 20:05:30.0 270000 T1 2021-02-24 20:04:50.0 1 -- count row below
GW1 2021-02-24 20:04:50.0 2021-02-24 20:08:24.0 214000 T1 2021-02-24 20:05:20.6 1 -- counted
GW1 2021-02-24 20:05:20.6 2021-02-24 20:07:50.1 149500 T1 2021-02-24 20:15:50.0 0 -- counted
GW1 2021-02-24 20:15:50.0 2021-02-24 20:17:00.0 70000 T1 2021-02-24 20:20:50.0 0 -- jump to else
GW1 2021-02-24 20:20:50.0 2021-02-24 20:21:00.0 10000 T1 2021-02-24 20:20:59.0 1
GW1 2021-02-24 20:20:59.0 2021-02-24 20:24:00.0 181000 T1 2021-02-24 20:23:59.0 1
GW1 2021-02-24 20:23:59.0 2021-02-24 20:28:30.0 271000 T1 2021-02-24 20:26:00.0 1
GW1 2021-02-24 20:26:00.0 2021-02-24 20:29:30.0 210000 T1 2021-02-24 20:27:00.0 1
GW1 2021-02-24 20:27:00.0 2021-02-24 20:29:31.0 151000 T1 2021-02-24 22:46:54.2 0
GW3 2021-02-24 22:46:54.2 2021-02-24 22:48:25.2 91000 T1 NULL 0
GW2 2021-02-24 20:41:49.0 2021-02-24 20:43:24.0 95000 T2 2021-02-24 22:47:25.1 0
GW99 2021-02-24 22:47:25.1 2021-02-24 22:47:54.4 29300 T2 NULL 0
问题是,如果满足我的case
(sim 列),我想增加计数,我尝试同时使用全局变量和局部变量,但是它正在为每一行重置,我无法强制如果输入了我的else
子句,则返回0
。
CREATE OR ALTER PROCEDURE GenerateTrunkSum
@date datetime2(7),
@period int
AS
BEGIN
DECLARE @raw_data table
(
GatewayName varchar(23),
StartDateTime datetime2(7),
DisconnectDateTime datetime2(7),
ConnectionDuration int ,
Trunk varchar(10)
);
-- Createing test data
INSERT INTO @raw_data values('GW1', '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0', DATEDIFF(millisecond, '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1', DATEDIFF(millisecond, '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0', DATEDIFF(millisecond, '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0', DATEDIFF(millisecond, '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0', DATEDIFF(millisecond, '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0'), 'T1')
INSERT INTO @raw_data values('GW2', '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0', DATEDIFF(millisecond, '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0'), 'T2')
INSERT INTO @raw_data values('GW3', '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2', DATEDIFF(millisecond, '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2'), 'T1')
INSERT INTO @raw_data values('GW99', '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4', DATEDIFF(millisecond, '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4'), 'T2')
-- Variable should be increased if not null
declare @localvar int
set @localvar = 0
-- Store value in max if 0 is meet and current @localvar is greater then @max
declare @max int
set @max = 0
SELECT GatewayName, StartDateTime, DisconnectDateTime, ConnectionDuration, Trunk, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY StartDateTime ) as nrDDT,
CASE
WHEN DATEDIFF(MILLISECOND, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY StartDateTime ), DisconnectDateTime) >= 0 THEN @localvar + 1 -- Add if Match 1 = 1 M3 = 3 etc
ELSE -- possible update @max and Reset @localvar = 0
END AS sim
INTO #Temp
FROM @raw_data;
select * from #Temp;
-- Creat timeslotable
with numbers(val) as
(select 1 union all select val + 1 from numbers where val < 48)
select @date, nbr.val,
dateadd(minute, (nbr.val - 1) * 30, @date) as period_start,
dateadd(minute, (nbr.val ) * 30, @date) as period_end
from numbers as nbr
order by nbr.val;
--Enummerate
with numbers(val) as
(select 1 union all select val + 1 from numbers where val < 48),
periods as (
select @date as [date], nbr.val,
dateadd(minute, (nbr.val - 1) * 30, @date) as period_start,
dateadd(minute, (nbr.val ) * 30, @date) as period_end
from numbers as nbr)
select pers.period_start, @period as Period, src.trunk, count(src.GatewayName) as 'all',
-- Case Added in update 2
CASE
WHEN MAX(src.sim) < 1 THEN 1 -- if max is 0 set 1, defaults to at least one active
ELSE MAX(src.sim)
END AS simultaneous
--from periods as pers left join HDO.CDR_RAW as src
from periods as pers inner join #Temp as src
on src.StartDateTime >= pers.period_start and src.StartDateTime < pers.period_end
group by src.trunk, pers.period_start
order by src.trunk
END
GO
EXECUTE GenerateTrunkSum @date = '20210224', @period = 1800;
所以我的问题是:有人知道如何让这个计数机制起作用吗?我想要这样的原因是能够在我的最后一个选择语句中执行MAX
(请参阅CASE
)
--- Current output
period_start period trunk all simultaneous
2021-02-24 20:00:00.0 1800 T1 9 1
2021-02-24 22:30:00.0 1800 T1 1 1
2021-02-24 20:30:00.0 1800 T2 1 1
2021-02-24 22:30:00.0 1800 T2 1 1
--- What it should be based on input
period_start period trunk all simultaneous
2021-02-24 20:00:00.0 1800 T1 9 5 -- See RAW_table for clarification
2021-02-24 22:30:00.0 1800 T1 1 1
2021-02-24 20:30:00.0 1800 T2 1 1
2021-02-24 22:30:00.0 1800 T2 1 1
示例输出
Trunk Start Period All sim
T1 2021:02:24 22:30:0.0 1800 5 2
更新 1
看着 Sørens 的回答,我尝试在 30 分钟内使用 inner join
它。
因此
设置无计数
使用 [dbo_CDR]
去吧
CREATE OR ALTER PROCEDURE [dbo].[GenerateTrunkSumv1]
@date datetime2(1),
@ST datetime2(1),
@DT datetime2(1),
@tn varchar(23),
@period int
AS
BEGIN
WITH TrunkGroup
AS (
SELECT
IngressTrunkGroup as Trunk
,StartDateTime
,DisconnectDateTime
FROM
[dbo].[CDR_RAW]
UNION ALL
SELECT
EgressTrunkGroup
,StartDateTime
,DisconnectDateTime
FROM
[dbo].[CDR_RAW]
),
Times AS
(SELECT
rd.StartDateTime tm
,rd.Trunk
FROM TrunkGroup rd
UNION
SELECT
rd.DisconnectDateTime
,rd.Trunk
FROM TrunkGroup rd),
intervals
AS
(SELECT
tm tm1
,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
,Trunk
FROM Times)
SELECT
i.Trunk
,i.tm1
,i.tm2
,COUNT(*) simultaneous
INTO #TEMP
FROM intervals i
INNER JOIN TrunkGroup rd
ON rd.DisconnectDateTime >= i.tm1
AND rd.StartDateTime < i.tm2
AND i.Trunk = rd.Trunk
AND i.tm2 IS NOT null
GROUP BY i.Trunk
,i.tm1
,i.tm2
ORDER BY i.Trunk,i.tm1
;
WITH Numbers(val) AS
(
SELECT
1
UNION ALL
SELECT
val + 1
FROM
numbers
WHERE
val < 48
)
SELECT
PeriodSummary.period_start
,PeriodSummary.period_end
,PeriodSummary.Period
,PeriodSummary.Trunk
,PeriodSummary.[all]
,PeriodSummary.simultaneous
FROM
(
SELECT
pers.period_start
,pers.period_end
,@period as [Period]
,src.Trunk
,src.simultaneous
,COUNT(*) as [all]
FROM
(
SELECT
dateadd(minute, (val - 1) * 30, '20210224') as period_start
,dateadd(minute, (val ) * 30, '20210224') as period_end
FROM
numbers
) pers
INNER JOIN
#TEMP as src
ON src.tm1 >= pers.period_start
AND src.tm1 < pers.period_end
GROUP BY
src.Trunk
,pers.period_start
,pers.period_end
,src.simultaneous
) PeriodSummary
END
GO
EXECUTE [dbo].[GenerateTrunkSumv1] @date = '20210224', @period = 1800, @ST = '2021-02-24 20:00:00.0', @DT = '2021-02-24 22:30:00.0000000', @tn = 'test';
SELECT @@ROWCOUNT as 'Inserted'
GO
此解决方案的问题是,如果时间段使用其中一个值(开始/断开时间)超过 30 分钟标记,我将收到重复的行
电流输出
period_start period_end Period Trunk all simultaneous
2021-02-24 20:00:00.0 2021-02-24 20:30:00.0 1800 I1 1 1
2021-02-24 20:00:00.0 2021-02-24 20:30:00.0 1800 I1 1 2
2021-02-24 20:30:00.0 2021-02-24 21:00:00.0 1800 I1 6 1
2021-02-24 20:30:00.0 2021-02-24 21:00:00.0 1800 I1 22 2
2021-02-24 20:30:00.0 2021-02-24 21:00:00.0 1800 I1 16 3
2021-02-24 20:30:00.0 2021-02-24 21:00:00.0 1800 I2 1 1
2021-02-24 20:30:00.0 2021-02-24 21:00:00.0 1800 I2 2 2
正如您在上面看到的,第一行出现了两次,因为它有一个通过标记的条目。
--Surrounding rows causing this issue
I1 2021-02-24 20:23:43.1 2021-02-24 20:24:34.6 1
I1 2021-02-24 20:24:34.6 2021-02-24 20:31:09.5 2
I1 2021-02-24 20:31:09.5 2021-02-24 20:32:32.9 3
I1 2021-02-24 20:32:32.9 2021-02-24 20:32:42.3 3
I1 2021-02-24 20:32:42.3 2021-02-24 20:32:51.4 3
I1 2021-02-24 20:32:51.4 2021-02-24 20:33:05.1 3
有没有人知道一种解决方案,该解决方案可以从必须每 30 分钟(00:00 / 00:30)开始的锁定时段转变为从第一个时间段开始创建 30 分钟。
如果可以更改第二行以反映异常值的开始/断开时间(可选)。
【问题讨论】:
我认为您的逻辑仍然存在故障:您评估每个连接,如果它在前一个连接的时间内。现在想象以下设置:第一个连接从 07:00 开始到 07:15 结束,第二个连接从 07:05 开始到 07:07 结束(在第一个连接内),第三个连接开始在 07:10 到 07:13 结束 - ehich 在第一个连接内,但不在前一个连接内...在这种情况下,您不会将其识别为同时连接,对吗? 这是一个非常好的观点,我已经监督过了。我将看看下面发布的解决方案,女巫不只关注一行。在验证/测试时,我也会考虑这个评论。谢谢! 【参考方案1】:如果我要找到同时连接,我不会只看下一个条目。
假设您有每个连接的开始时间和结束时间。使用这些时间制作所有连续的间隔,您将拥有数据集的所有“有趣”间隔。 然后将这些间隔与您的原始数据连接起来,您可以计算每个间隔中有多少个连接。因为你总是有事情发生,所以你一定会抓住ecerything。
首先我选择所有时间:
WITH Times
AS
(SELECT
rd.StartDateTime tm
,rd.Trunk
FROM #raw_data rd
UNION
SELECT
rd.DisconnectDateTime
,rd.Trunk
FROM #raw_data rd)
这只是您数据集的所有时间 - 按主干分组,因为我们希望将它们分开。
然后创建所有区间:
intervals
AS
(SELECT
tm tm1
,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
,Trunk
FROM Times)
在这里,我们得到一天中的第一个时间,一天中的第二个时间,一天中的第二个时间到一天中的第三个时间等等。也就是说,我们已将所有连接的时间段划分为以 a 开头或结尾的确切时间间隔连接开始或断开。
现在我们只需要结合原始数据来查看每个间隔中有多少连接:
SELECT
i.Trunk
,i.tm1
,i.tm2
,COUNT(*) simultaneous
FROM intervals i
INNER JOIN #raw_data rd
ON rd.DisconnectDateTime >= i.tm1
AND rd.StartDateTime < i.tm2
AND i.Trunk = rd.Trunk
AND i.tm2 IS NOT null
GROUP BY i.Trunk
,i.tm1
,i.tm2
ORDER BY i.trunk,i.tm1
这给出了这个表:
+-------+-----------------------------+-----------------------------+--------------+
| Trunk | tm1 | tm2 | simultaneous |
+-------+-----------------------------+-----------------------------+--------------+
| T1 | 2021-02-24 20:01:00.0000000 | 2021-02-24 20:04:50.0000000 | 1 |
| T1 | 2021-02-24 20:04:50.0000000 | 2021-02-24 20:05:20.6000000 | 2 |
| T1 | 2021-02-24 20:05:20.6000000 | 2021-02-24 20:05:30.0000000 | 3 |
| T1 | 2021-02-24 20:05:30.0000000 | 2021-02-24 20:07:50.1000000 | 3 |
| T1 | 2021-02-24 20:07:50.1000000 | 2021-02-24 20:08:24.0000000 | 2 |
| T1 | 2021-02-24 20:08:24.0000000 | 2021-02-24 20:15:50.0000000 | 1 |
| T1 | 2021-02-24 20:15:50.0000000 | 2021-02-24 20:17:00.0000000 | 1 |
| T1 | 2021-02-24 20:17:00.0000000 | 2021-02-24 20:20:50.0000000 | 1 |
| T1 | 2021-02-24 20:20:50.0000000 | 2021-02-24 20:20:59.0000000 | 1 |
| T1 | 2021-02-24 20:20:59.0000000 | 2021-02-24 20:21:00.0000000 | 2 |
| T1 | 2021-02-24 20:21:00.0000000 | 2021-02-24 20:24:00.0000000 | 2 |
| T1 | 2021-02-24 20:24:00.0000000 | 2021-02-24 20:25:00.0000000 | 1 |
| T1 | 2021-02-24 20:25:00.0000000 | 2021-02-24 20:28:30.0000000 | 1 |
| T1 | 2021-02-24 20:28:30.0000000 | 2021-02-24 22:46:54.2000000 | 1 |
| T1 | 2021-02-24 22:46:54.2000000 | 2021-02-24 22:48:25.2000000 | 1 |
| T2 | 2021-02-24 20:41:49.0000000 | 2021-02-24 20:43:24.0000000 | 1 |
| T2 | 2021-02-24 20:43:24.0000000 | 2021-02-24 22:47:25.1000000 | 1 |
| T2 | 2021-02-24 22:47:25.1000000 | 2021-02-24 22:47:54.4000000 | 1 |
+-------+-----------------------------+-----------------------------+--------------+
现在您可以将其加入到您设定的时间段,记住将时间间隔设为开放式,您可以找到每个时间段内同时连接的最大数量。
完整的查询在这里:
WITH Times
AS
(SELECT
rd.StartDateTime tm
,rd.Trunk
FROM #raw_data rd
UNION
SELECT
rd.DisconnectDateTime
,rd.Trunk
FROM #raw_data rd),
intervals
AS
(SELECT
tm tm1
,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
,Trunk
FROM Times)
SELECT
i.Trunk
,i.tm1
,i.tm2
,COUNT(*) simultaneous
FROM intervals i
INNER JOIN #raw_data rd
ON rd.DisconnectDateTime >= i.tm1
AND rd.StartDateTime < i.tm2
AND i.Trunk = rd.Trunk
AND i.tm2 IS NOT null
GROUP BY i.Trunk
,i.tm1
,i.tm2
ORDER BY i.trunk,i.tm1
【讨论】:
感谢您将其扩大到比单行更大的范围;我不确定如何存档。我今天将看看这个实现,以确保我完全理解它:)。一旦我尝试一下,您就会收到我的来信。 这项工作非常好,适合我的大部分要求,但是,它对我来说还不是很顺利,目前,正如你提到的,这个时期是开放式的。但是我一直面临着需要将它们分组为块大小(等 30 分钟)以允许 s-s-rS 中的记者从下拉列表中选择时间跨度(30 分钟、1 小时、24 小时等)的挑战你对此有任何指示吗? 如果我还想计算每个时间段的“所有”连接,你会添加这个吗,因为我们在这里合并表格,我似乎得到了重复的计数(在某些值上)当尝试“选择计数(*)作为全部”时,我将使用示例输出更新我的问题 嘿@søren-kongstad,我现在花了几天时间试图弄清楚如何加入生成的输出以匹配我的 30 分钟标准。不幸的是,虽然我无法完成它。我添加了一个更新,显示了当前结果和我用来实现它的代码。您是否有时间详细说明如何存档?以上是关于查找最大同时出现次数,如果满足条件则增加变量的主要内容,如果未能解决你的问题,请参考以下文章