查找最大同时出现次数,如果满足条件则增加变量

Posted

技术标签:

【中文标题】查找最大同时出现次数,如果满足条件则增加变量【英文标题】:Find max simultanius occurences, increasse variable if criteria is meet 【发布时间】:2021-06-19 00:46:47 【问题描述】:

我有一项任务,旨在将进入给定端口(中继)集的几个连接汇总到时隙中并计算结果。问题是我想实现两种类型的计数:

一个计数应计算给定插槽中与Trunk 的连接总数(每个插槽 30 分钟) 其次,我想找出同时发生的最大连接数:例如总共有 10 个连接,但其中只有 3 个同时连接。表中的 cmets 表示“计数组”

表:RAW_DATA

GatewayName StartDateTime               DisconnectDateTime      ConDur  Trunk
GW1         2021-02-24 20:01:00.0       2021-02-24 20:05:30.0   270000  T1  --1, nextRow.Start is before discon   
GW1         2021-02-24 20:04:50.0       2021-02-24 20:08:24.0   214000  T1  --2   
GW1         2021-02-24 20:05:20.6       2021-02-24 20:07:50.1   149500  T1  --3   
GW1         2021-02-24 20:15:50.0       2021-02-24 20:17:00.0   70000   T1  --0   
GW1         2021-02-24 20:20:50.0       2021-02-24 20:21:00.0   10000   T1  --1   
GW1         2021-02-24 20:20:59.0       2021-02-24 20:24:00.0   181000  T1  --2   
GW1         2021-02-24 20:23:59.0       2021-02-24 20:28:30.0   271000  T1  --3   
GW1         2021-02-24 20:26:00.0       2021-02-24 20:29:30.0   210000  T1  --4   
GW1         2021-02-24 20:27:00.0       2021-02-24 20:29:31.0   151000  T1  --5   
GW3         2021-02-24 22:46:54.2       2021-02-24 22:48:25.2   91000   T1  --0  
GW2         2021-02-24 20:41:49.0       2021-02-24 20:43:24.0   95000   T2  --0   
GW99        2021-02-24 22:47:25.1       2021-02-24 22:47:54.4   29300   T2  --0 

目前的结果

我正在运行一个存储过程,该过程创建一个计数表,我用它来生成我的时间段。

另外,我创建了一个在运行时间段排序之前运行的临时表,该表的目的是查看下一行 StartDateTime 以查看它是否在当前行 DisconnectDateTime 之前。该表是作为测试表构建的,以查看是否可以在遇到零 (0) 之前以某种方式对事件进行计数和分组,以便稍后执行 max 以一次获得最大数量的事件。不幸的是,我无法弄清楚这个机制。

GatewayName StartDateTime               DisconnectDateTime      ConDur  Trunk   nrDDT                   sim
GW1         2021-02-24 20:01:00.0       2021-02-24 20:05:30.0   270000  T1      2021-02-24 20:04:50.0   1 -- count row below
GW1         2021-02-24 20:04:50.0       2021-02-24 20:08:24.0   214000  T1      2021-02-24 20:05:20.6   1 -- counted
GW1         2021-02-24 20:05:20.6       2021-02-24 20:07:50.1   149500  T1      2021-02-24 20:15:50.0   0 -- counted
GW1         2021-02-24 20:15:50.0       2021-02-24 20:17:00.0   70000   T1      2021-02-24 20:20:50.0   0 -- jump to else
GW1         2021-02-24 20:20:50.0       2021-02-24 20:21:00.0   10000   T1      2021-02-24 20:20:59.0   1
GW1         2021-02-24 20:20:59.0       2021-02-24 20:24:00.0   181000  T1      2021-02-24 20:23:59.0   1
GW1         2021-02-24 20:23:59.0       2021-02-24 20:28:30.0   271000  T1      2021-02-24 20:26:00.0   1
GW1         2021-02-24 20:26:00.0       2021-02-24 20:29:30.0   210000  T1      2021-02-24 20:27:00.0   1
GW1         2021-02-24 20:27:00.0       2021-02-24 20:29:31.0   151000  T1      2021-02-24 22:46:54.2   0
GW3         2021-02-24 22:46:54.2       2021-02-24 22:48:25.2   91000   T1      NULL                    0
GW2         2021-02-24 20:41:49.0       2021-02-24 20:43:24.0   95000   T2      2021-02-24 22:47:25.1   0
GW99        2021-02-24 22:47:25.1       2021-02-24 22:47:54.4   29300   T2      NULL                    0

问题是,如果满足我的case(sim 列),我想增加计数,我尝试同时使用全局变量和局部变量,但是它正在为每一行重置,我无法强制如果输入了我的else 子句,则返回0

CREATE OR ALTER PROCEDURE GenerateTrunkSum
@date datetime2(7),
@period int
AS
BEGIN

DECLARE @raw_data table
(
GatewayName varchar(23),
StartDateTime datetime2(7),
DisconnectDateTime datetime2(7),
ConnectionDuration int ,
Trunk varchar(10)
);
-- Createing test data
INSERT INTO @raw_data values('GW1', '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0', DATEDIFF(millisecond, '2021-02-24 20:01:00.0', '2021-02-24 20:05:30.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1', DATEDIFF(millisecond, '2021-02-24 20:05:20.6', '2021-02-24 20:07:50.1'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0', DATEDIFF(millisecond, '2021-02-24 20:04:50.0', '2021-02-24 20:08:24.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0', DATEDIFF(millisecond, '2021-02-24 20:15:50.0', '2021-02-24 20:17:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:50.0', '2021-02-24 20:21:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0', DATEDIFF(millisecond, '2021-02-24 20:20:59.0', '2021-02-24 20:24:00.0'), 'T1')
INSERT INTO @raw_data values('GW1', '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0', DATEDIFF(millisecond, '2021-02-24 20:25:00.0', '2021-02-24 20:28:30.0'), 'T1')
INSERT INTO @raw_data values('GW2', '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0', DATEDIFF(millisecond, '2021-02-24 20:41:49.0 ', '2021-02-24 20:43:24.0'), 'T2')
INSERT INTO @raw_data values('GW3', '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2', DATEDIFF(millisecond, '2021-02-24 22:46:54.2', '2021-02-24 22:48:25.2'), 'T1')
INSERT INTO @raw_data values('GW99', '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4', DATEDIFF(millisecond, '2021-02-24 22:47:25.1', '2021-02-24 22:47:54.4'), 'T2')


-- Variable should be increased if not null 
declare @localvar int 
set @localvar = 0
-- Store value in max if 0 is meet and current @localvar is greater then @max
declare @max int 
set @max = 0

   SELECT GatewayName, StartDateTime, DisconnectDateTime, ConnectionDuration, Trunk, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY  StartDateTime ) as nrDDT,
       CASE 
           WHEN DATEDIFF(MILLISECOND, LEAD(StartDateTime, 1, NULL) OVER ( PARTITION BY Trunk ORDER BY  StartDateTime ), DisconnectDateTime) >= 0 THEN @localvar + 1 -- Add if Match 1 = 1 M3 = 3 etc
           ELSE -- possible update @max and Reset @localvar = 0
       END AS sim
   INTO #Temp
   FROM @raw_data;

       select * from #Temp;
   
   -- Creat timeslotable
       with numbers(val) as 
           (select 1 union all select val + 1 from numbers where val < 48)
       select @date, nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
       from numbers as nbr 
       order by nbr.val;

       --Enummerate
       with numbers(val) as 
           (select 1 union all select val + 1 from numbers where val < 48),
       periods as (
           select @date as [date], nbr.val, 
           dateadd(minute, (nbr.val - 1) * 30, @date) as period_start, 
           dateadd(minute, (nbr.val    ) * 30, @date) as period_end 
           from numbers as nbr)
       select pers.period_start, @period as Period, src.trunk, count(src.GatewayName) as 'all', 
           -- Case Added in update 2
           CASE
               WHEN MAX(src.sim) < 1 THEN 1 -- if max is 0 set 1, defaults to at least one active
               ELSE MAX(src.sim)
           END AS simultaneous
       --from periods as pers left  join HDO.CDR_RAW as src 
       from periods as pers inner join #Temp as src 
       on src.StartDateTime >= pers.period_start and src.StartDateTime < pers.period_end
       group by src.trunk, pers.period_start
       order by src.trunk 
END
GO

EXECUTE GenerateTrunkSum @date = '20210224', @period = 1800; 

所以我的问题是:有人知道如何让这个计数机制起作用吗?我想要这样的原因是能够在我的最后一个选择语句中执行MAX(请参阅CASE

--- Current output
period_start            period  trunk all   simultaneous
2021-02-24 20:00:00.0   1800    T1    9     1
2021-02-24 22:30:00.0   1800    T1    1     1
2021-02-24 20:30:00.0   1800    T2    1     1
2021-02-24 22:30:00.0   1800    T2    1     1

--- What it should be based on input
period_start            period  trunk all   simultaneous
2021-02-24 20:00:00.0   1800    T1    9     5 -- See RAW_table for clarification
2021-02-24 22:30:00.0   1800    T1    1     1
2021-02-24 20:30:00.0   1800    T2    1     1
2021-02-24 22:30:00.0   1800    T2    1     1

示例输出

Trunk  Start                                Period   All  sim
T1       2021:02:24 22:30:0.0     1800      5     2

更新 1

看着 Sørens 的回答,我尝试在 30 分钟内使用 inner join 它。 因此 设置无计数 使用 [dbo_CDR] 去吧

CREATE OR ALTER PROCEDURE [dbo].[GenerateTrunkSumv1]
@date datetime2(1),
@ST datetime2(1),
@DT datetime2(1),
@tn varchar(23),
@period int
AS
BEGIN

WITH TrunkGroup
AS (
    SELECT
        IngressTrunkGroup as Trunk
    ,StartDateTime
    ,DisconnectDateTime
    FROM 
        [dbo].[CDR_RAW]

    UNION ALL

    SELECT 
        EgressTrunkGroup
    ,StartDateTime
    ,DisconnectDateTime
    FROM 
        [dbo].[CDR_RAW]
),
Times AS
(SELECT
        rd.StartDateTime tm
    ,rd.Trunk
    FROM TrunkGroup rd
    UNION
    SELECT
        rd.DisconnectDateTime
    ,rd.Trunk
    FROM TrunkGroup rd),
intervals
AS
(SELECT
        tm tm1
    ,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
    ,Trunk
    FROM Times)
SELECT
    i.Trunk
,i.tm1
,i.tm2
,COUNT(*) simultaneous
INTO #TEMP
FROM intervals i
INNER JOIN TrunkGroup rd
    ON rd.DisconnectDateTime >= i.tm1
        AND rd.StartDateTime < i.tm2
        AND i.Trunk = rd.Trunk
        AND i.tm2 IS NOT null
GROUP BY i.Trunk
        ,i.tm1
        ,i.tm2
ORDER BY i.Trunk,i.tm1

;

WITH Numbers(val) AS
(
SELECT
    1 
    
    UNION ALL
    
SELECT
    val + 1 
FROM
    numbers 
WHERE
    val < 48
)
SELECT
PeriodSummary.period_start
,PeriodSummary.period_end
,PeriodSummary.Period
,PeriodSummary.Trunk
,PeriodSummary.[all]
,PeriodSummary.simultaneous
FROM
(
    SELECT
    pers.period_start
    ,pers.period_end
    ,@period as [Period]
    ,src.Trunk
    ,src.simultaneous
    ,COUNT(*) as [all]
    FROM
    (
        SELECT
        dateadd(minute, (val - 1) * 30, '20210224') as period_start
        ,dateadd(minute, (val    ) * 30, '20210224') as period_end 
        FROM
        numbers
    ) pers 
INNER JOIN
    #TEMP as src 
    ON  src.tm1 >= pers.period_start
            AND src.tm1 < pers.period_end
GROUP BY
    src.Trunk
    ,pers.period_start
    ,pers.period_end
    ,src.simultaneous
) PeriodSummary

END
GO

EXECUTE [dbo].[GenerateTrunkSumv1] @date = '20210224', @period = 1800, @ST = '2021-02-24 20:00:00.0', @DT = '2021-02-24 22:30:00.0000000', @tn = 'test';
SELECT @@ROWCOUNT as 'Inserted'
GO

此解决方案的问题是,如果时间段使用其中一个值(开始/断开时间)超过 30 分钟标记,我将收到重复的行

电流输出

period_start              period_end                Period  Trunk   all     simultaneous
2021-02-24 20:00:00.0     2021-02-24 20:30:00.0     1800    I1      1       1
2021-02-24 20:00:00.0     2021-02-24 20:30:00.0     1800    I1      1       2
2021-02-24 20:30:00.0     2021-02-24 21:00:00.0     1800    I1      6       1
2021-02-24 20:30:00.0     2021-02-24 21:00:00.0     1800    I1      22      2
2021-02-24 20:30:00.0     2021-02-24 21:00:00.0     1800    I1      16      3
2021-02-24 20:30:00.0     2021-02-24 21:00:00.0     1800    I2      1       1
2021-02-24 20:30:00.0     2021-02-24 21:00:00.0     1800    I2      2       2

正如您在上面看到的,第一行出现了两次,因为它有一个通过标记的条目。

--Surrounding rows causing this issue
I1  2021-02-24 20:23:43.1   2021-02-24 20:24:34.6   1
I1  2021-02-24 20:24:34.6   2021-02-24 20:31:09.5   2
I1  2021-02-24 20:31:09.5   2021-02-24 20:32:32.9   3
I1  2021-02-24 20:32:32.9   2021-02-24 20:32:42.3   3
I1  2021-02-24 20:32:42.3   2021-02-24 20:32:51.4   3
I1  2021-02-24 20:32:51.4   2021-02-24 20:33:05.1   3

有没有人知道一种解决方案,该解决方案可以从必须每 30 分钟(00:00 / 00:30)开始的锁定时段转变为从第一个时间段开始创建 30 分钟。

如果可以更改第二行以反映异常值的开始/断开时间(可选)。

【问题讨论】:

我认为您的逻辑仍然存在故障:您评估每个连接,如果它在前一个连接的时间内。现在想象以下设置:第一个连接从 07:00 开始到 07:15 结束,第二个连接从 07:05 开始到 07:07 结束(在第一个连接内),第三个连接开始在 07:10 到 07:13 结束 - ehich 在第一个连接内,但不在前一个连接内...在这种情况下,您不会将其识别为同时连接,对吗? 这是一个非常好的观点,我已经监督过了。我将看看下面发布的解决方案,女巫不只关注一行。在验证/测试时,我也会考虑这个评论。谢谢! 【参考方案1】:

如果我要找到同时连接,我不会只看下一个条目。

假设您有每个连接的开始时间和结束时间。使用这些时间制作所有连续的间隔,您将拥有数据集的所有“有趣”间隔。 然后将这些间隔与您的原始数据连接起来,您可以计算每个间隔中有多少个连接。因为你总是有事情发生,所以你一定会抓住ecerything。

首先我选择所有时间:

WITH Times
AS
(SELECT
        rd.StartDateTime tm
       ,rd.Trunk
    FROM #raw_data rd
    UNION
    SELECT
        rd.DisconnectDateTime
       ,rd.Trunk
    FROM #raw_data rd)
    

这只是您数据集的所有时间 - 按主干分组,因为我们希望将它们分开。

然后创建所有区间:

intervals
AS
(SELECT
        tm tm1
       ,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
       ,Trunk
    FROM Times)

在这里,我们得到一天中的第一个时间,一天中的第二个时间,一天中的第二个时间到一天中的第三个时间等等。也就是说,我们已将所有连接的时间段划分为以 a 开头或结尾的确切时间间隔连接开始或断开。

现在我们只需要结合原始数据来查看每个间隔中有多少连接:

SELECT
    i.Trunk
   ,i.tm1
   ,i.tm2
   ,COUNT(*) simultaneous
FROM intervals i
INNER JOIN #raw_data rd
    ON rd.DisconnectDateTime >= i.tm1
        AND rd.StartDateTime < i.tm2
        AND i.Trunk = rd.Trunk
        AND i.tm2 IS NOT null
GROUP BY i.Trunk
        ,i.tm1
        ,i.tm2
ORDER BY i.trunk,i.tm1

这给出了这个表:

+-------+-----------------------------+-----------------------------+--------------+
| Trunk |             tm1             |             tm2             | simultaneous |
+-------+-----------------------------+-----------------------------+--------------+
| T1    | 2021-02-24 20:01:00.0000000 | 2021-02-24 20:04:50.0000000 |            1 |
| T1    | 2021-02-24 20:04:50.0000000 | 2021-02-24 20:05:20.6000000 |            2 |
| T1    | 2021-02-24 20:05:20.6000000 | 2021-02-24 20:05:30.0000000 |            3 |
| T1    | 2021-02-24 20:05:30.0000000 | 2021-02-24 20:07:50.1000000 |            3 |
| T1    | 2021-02-24 20:07:50.1000000 | 2021-02-24 20:08:24.0000000 |            2 |
| T1    | 2021-02-24 20:08:24.0000000 | 2021-02-24 20:15:50.0000000 |            1 |
| T1    | 2021-02-24 20:15:50.0000000 | 2021-02-24 20:17:00.0000000 |            1 |
| T1    | 2021-02-24 20:17:00.0000000 | 2021-02-24 20:20:50.0000000 |            1 |
| T1    | 2021-02-24 20:20:50.0000000 | 2021-02-24 20:20:59.0000000 |            1 |
| T1    | 2021-02-24 20:20:59.0000000 | 2021-02-24 20:21:00.0000000 |            2 |
| T1    | 2021-02-24 20:21:00.0000000 | 2021-02-24 20:24:00.0000000 |            2 |
| T1    | 2021-02-24 20:24:00.0000000 | 2021-02-24 20:25:00.0000000 |            1 |
| T1    | 2021-02-24 20:25:00.0000000 | 2021-02-24 20:28:30.0000000 |            1 |
| T1    | 2021-02-24 20:28:30.0000000 | 2021-02-24 22:46:54.2000000 |            1 |
| T1    | 2021-02-24 22:46:54.2000000 | 2021-02-24 22:48:25.2000000 |            1 |
| T2    | 2021-02-24 20:41:49.0000000 | 2021-02-24 20:43:24.0000000 |            1 |
| T2    | 2021-02-24 20:43:24.0000000 | 2021-02-24 22:47:25.1000000 |            1 |
| T2    | 2021-02-24 22:47:25.1000000 | 2021-02-24 22:47:54.4000000 |            1 |
+-------+-----------------------------+-----------------------------+--------------+

现在您可以将其加入到您设定的时间段,记住将时间间隔设为开放式,您可以找到每个时间段内同时连接的最大数量。

完整的查询在这里:

WITH Times
AS
(SELECT
        rd.StartDateTime tm
       ,rd.Trunk
    FROM #raw_data rd
    UNION
    SELECT
        rd.DisconnectDateTime
       ,rd.Trunk
    FROM #raw_data rd),
intervals
AS
(SELECT
        tm tm1
       ,LEAD(tm, 1) OVER (PARTITION BY Trunk ORDER BY tm) tm2
       ,Trunk
    FROM Times)
SELECT
    i.Trunk
   ,i.tm1
   ,i.tm2
   ,COUNT(*) simultaneous
FROM intervals i
INNER JOIN #raw_data rd
    ON rd.DisconnectDateTime >= i.tm1
        AND rd.StartDateTime < i.tm2
        AND i.Trunk = rd.Trunk
        AND i.tm2 IS NOT null
GROUP BY i.Trunk
        ,i.tm1
        ,i.tm2
ORDER BY i.trunk,i.tm1

【讨论】:

感谢您将其扩大到比单行更大的范围;我不确定如何存档。我今天将看看这个实现,以确保我完全理解它:)。一旦我尝试一下,您就会收到我的来信。 这项工作非常好,适合我的大部分要求,但是,它对我来说还不是很顺利,目前,正如你提到的,这个时期是开放式的。但是我一直面临着需要将它们分组为块大小(等 30 分钟)以允许 s-s-rS 中的记者从下拉列表中选择时间跨度(30 分钟、1 小时、24 小时等)的挑战你对此有任何指示吗? 如果我还想计算每个时间段的“所有”连接,你会添加这个吗,因为我们在这里合并表格,我似乎得到了重复的计数(在某些值上)当尝试“选择计数(*)作为全部”时,我将使用示例输出更新我的问题 嘿@søren-kongstad,我现在花了几天时间试图弄清楚如何加入生成的输出以匹配我的 30 分钟标准。不幸的是,虽然我无法完成它。我添加了一个更新,显示了当前结果和我用来实现它的代码。您是否有时间详细说明如何存档?

以上是关于查找最大同时出现次数,如果满足条件则增加变量的主要内容,如果未能解决你的问题,请参考以下文章

excel 满足一个条件 显示对应行倒数第二行?

数组中出现次数超过数组长度一半的数字

如果不满足最小和最大条件,则阻止表单提交

电源查询:如果满足条件,则查找最小日期

Hive:如果两个表之间满足条件,则查找唯一值

在多线程环境中满足条件(LessThan 常量)时增加变量