按日期获取状态计数,但仅计算连续行数

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了按日期获取状态计数,但仅计算连续行数相关的知识,希望对你有一定的参考价值。

我有这些数据:

ID  Name        Status  Date
1   Machine1    Active  2018-01-01
2   Machine2    Fault   2018-01-01
3   Machine3    Active  2018-01-01
4   Machine1    Fault   2018-01-02
5   Machine2    Active  2018-01-02
6   Machine3    Active  2018-01-02
7   Machine2    Active  2018-01-03
8   Machine1    Fault   2018-01-03
9   Machine2    Active  2018-01-04
10  Machine1    Fault   2018-01-04
11  Machine3    Active  2018-01-06

INPUT

我希望这些数据在输出中

预期输出

Name           Last Status  Count
Machine1         Fault       3  
Machine2         Active      3  
Machine3         Active      1       Because Date is not Continuous

*计数:连续历史记录中的最后一个状态数

答案

我相信这很简单:

WITH cte1 AS (
    SELECT
        Name,
        Status,
        DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate
    FROM testdata
), cte2 AS (
    SELECT
        Name,
        Status,
        RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber
    FROM cte1
)
SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount
FROM cte2
WHERE GroupingNumber = 1
GROUP BY Name, Status
ORDER BY Name

Result and DBFiddle

| Name     | LastStatus | LastStatusCount |
|----------|------------|-----------------|
| Machine1 | Fault      | 3               |
| Machine2 | Active     | 3               |
| Machine3 | Active     | 1               |

为了理解其工作原理,请查看CTE生成的中间值:

| Name     | Status | Date                | RowNumber | GroupingDate        | GroupingNumber |
|----------|--------|---------------------|-----------|---------------------|----------------|
| Machine1 | Fault  | 04/01/2018 00:00:00 | 1         | 04/01/2018 00:00:00 | 1              |
| Machine1 | Fault  | 03/01/2018 00:00:00 | 2         | 04/01/2018 00:00:00 | 1              |
| Machine1 | Fault  | 02/01/2018 00:00:00 | 3         | 04/01/2018 00:00:00 | 1              |
| Machine1 | Active | 01/01/2018 00:00:00 | 1         | 01/01/2018 00:00:00 | 4              |
| Machine2 | Active | 04/01/2018 00:00:00 | 1         | 04/01/2018 00:00:00 | 1              |
| Machine2 | Active | 03/01/2018 00:00:00 | 2         | 04/01/2018 00:00:00 | 1              |
| Machine2 | Active | 02/01/2018 00:00:00 | 3         | 04/01/2018 00:00:00 | 1              |
| Machine2 | Fault  | 01/01/2018 00:00:00 | 1         | 01/01/2018 00:00:00 | 4              |
| Machine3 | Active | 06/01/2018 00:00:00 | 1         | 06/01/2018 00:00:00 | 1              |
| Machine3 | Active | 02/01/2018 00:00:00 | 2         | 03/01/2018 00:00:00 | 2              |
| Machine3 | Active | 01/01/2018 00:00:00 | 3         | 03/01/2018 00:00:00 | 2              |

这里的诀窍是,如果两个数字是连续的,那么从它们中减去连续的数字将得到相同的值。例如。如果我们有5, 6, 8, 9然后按顺序减去1, 2, 3, 4将产生4, 4, 5, 5

另一答案

我认为这会有效,虽然SQLFiddle目前很合适,所以我无法测试:

SELECT [Name], [Status], ct as [Count]
FROM (
 SELECT 
  [name], 
  [status], 
  [date],
  1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
  row_number() over(partition by [name] order by [date] desc) rn
 FROM
 (
  SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
  FROM t
 ) x
) y
WHERE
  rn = 1

它首先使用LAG查看当前行和上一行(将数据分组为机器名称和状态,按日期排序数据),如果当前日期是1天与上一个日期不同,则记录1或其他0

这些1和0以运行总计方式求和,在机器名称或状态更改时重置(sum()的分区在()上

我们也想根据机器名称来考虑数据,我们只想要每台机器的最新记录,所以我们按机器名称进行分区,按日期降序排序,然后选择(使用where子句) )每台机器编号为1的行

如果你单独运行查询,这样做会更有意义

对于给定的状态和机器,计算“是与前一个报告连续的当前报告”1 =是,0 =否:

SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
  FROM t

计算“当前连续报告块的运行总数是多少”:

SELECT 
  [name], 
  [status], 
  [date],
  1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
  row_number() over(partition by [name] order by [date] desc) rn
 FROM
 (
  SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
  FROM t
 ) x

当然,整个事情,但没有where子句,所以你可以看到我们丢弃的数据:

SELECT [Name], [Status], ct as [Count]
FROM (
 SELECT 
  [name], 
  [status], 
  [date],
  1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
  row_number() over(partition by [name] order by [date] desc) rn
 FROM
 (
  SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
  FROM t
 ) x
) y

小提琴终于醒了:

http://www.sqlfiddle.com/#!18/77dae/2

以上是关于按日期获取状态计数,但仅计算连续行数的主要内容,如果未能解决你的问题,请参考以下文章

当两个或多个连续行具有相同状态时如何选择一行

计算同一列之间的差异,在python中由另一列分组的连续行

如何创建一个变量,该变量是给定时间范围内连续行的总和并按 id

如何计算连续行的时差

计算每天 Ms-Sql 总行中的最大连续行

差距和岛屿 - 如何按 ID 对每组连续行求和