统计 SQL 中连续分组条目的数量

Posted

技术标签:

【中文标题】统计 SQL 中连续分组条目的数量【英文标题】:Count number of consecutive grouped entries in SQL 【发布时间】:2018-11-14 17:07:18 【问题描述】:

我想创建并填充以下 否。使用 SQL (sql server) 在下面看到的 Curr.Status 字段中的条目数

ID          Sequence    Prev.Status Curr.Status No. of Entries in Curr.Status
9-9999-9    1           Status D    Status A    1
9-9999-9    2           Status A    Status A    2
9-9999-9    3           Status A    Status A    3
9-9999-9    4           Status A    Status A    4
9-9999-9    5           Status A    Status B    1
9-9999-9    6           Status B    Status B    2
9-9999-9    7           Status B    Status B    3
9-9999-9    8           Status B    Status A    1
9-9999-9    9           Status A    Status A    2
9-9999-9    10          Status A    Status C    1
9-9999-9    11          Status C    Status C    2

有没有使用类似row_number() 之类的快速方法(仅此一项似乎还不够)来创建我正在寻找的字段?

谢谢!

【问题讨论】:

你做了什么尝试?请务必分享。 这个数字是在哪些字段上计算的? 【参考方案1】:

这似乎是组和岛屿问题。然而,有很多关于如何实现这一点的例子:

WITH VTE AS(
        SELECT *
        FROM (VALUES('9-9999-9',1 ,'Status D','Status A'),
                    ('9-9999-9',2 ,'Status A','Status A'),
                    ('9-9999-9',3 ,'Status A','Status A'),
                    ('9-9999-9',4 ,'Status A','Status A'),
                    ('9-9999-9',5 ,'Status A','Status B'),
                    ('9-9999-9',6 ,'Status B','Status B'),
                    ('9-9999-9',7 ,'Status B','Status B'),
                    ('9-9999-9',8 ,'Status B','Status A'),
                    ('9-9999-9',9 ,'Status A','Status A'),
                    ('9-9999-9',10,'Status A','Status C'),
                    ('9-9999-9',11,'Status C','Status C')) V(ID, Sequence, PrevStatus,CurrStatus)),
CTE AS(            
    SELECT ID,
           [Sequence],
           PrevStatus,
           CurrStatus,
           ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Sequence]) -
           ROW_NUMBER() OVER (PARTITION BY ID,CurrStatus ORDER BY [Sequence]) AS Grp
    FROM VTE V)
SELECT ID,
       [Sequence],
       PrevStatus,
       CurrStatus,
       ROW_NUMBER() OVER (PARTITION BY Grp ORDER BY [Sequence]) AS Entries
FROM CTE;

【讨论】:

【参考方案2】:

您可以使用LAG 函数标记状态变化的行,并使用SUM() OVER () 为每个组分配唯一编号。组内编号很简单:

DECLARE @t TABLE (ID VARCHAR(100), Sequence INT, PrevStatus VARCHAR(100), CurrStatus VARCHAR(100));
INSERT INTO @t VALUES
('9-9999-9',  1, 'Status D', 'Status A'),
('9-9999-9',  2, 'Status A', 'Status A'),
('9-9999-9',  3, 'Status A', 'Status A'),
('9-9999-9',  4, 'Status A', 'Status A'),
('9-9999-9',  5, 'Status A', 'Status B'),
('9-9999-9',  6, 'Status B', 'Status B'),
('9-9999-9',  7, 'Status B', 'Status B'),
('9-9999-9',  8, 'Status B', 'Status A'),
('9-9999-9',  9, 'Status A', 'Status A'),
('9-9999-9', 10, 'Status A', 'Status C'),
('9-9999-9', 11, 'Status C', 'Status C');

WITH cte1 AS (
    SELECT *, CASE WHEN LAG(CurrStatus) OVER(ORDER BY Sequence) = CurrStatus THEN 0 ELSE 1 END AS chg
    FROM @t
), cte2 AS (
    SELECT *, SUM(chg) OVER(ORDER BY Sequence) AS grp
    FROM cte1
), cte3 AS (
    SELECT *, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY Sequence) AS SeqInGroup
    FROM cte2
)
SELECT *
FROM cte3
ORDER BY Sequence

Demo on DB Fiddle

【讨论】:

谢谢。通过一些快速的更改,这解决了我的问题。知道 'Partition by r1 - r2' 可用于此类操作非常有帮助!【参考方案3】:

如果Sequence 是标识列,那么您可以这样做:

select t.*, 
       row_number() over (partition by (Sequence - seq) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*, 
             row_number() over (partition by [Curr.Status] order by Sequence) as seq
      from table t
     ) t;

否则你需要生成两个row_numbers

select t.*, 
       row_number() over (partition by (seq1- seq2) order by Sequence) as [No. of Entries in Curr.Status]
from (select t.*, 
             row_number() over (partition by id order by Sequence) as seq1
             row_number() over (partition by id, [Curr.Status] order by Sequence) as seq2
      from table t
     ) t;

【讨论】:

以上是关于统计 SQL 中连续分组条目的数量的主要内容,如果未能解决你的问题,请参考以下文章

使用 sqlite 查找连续重复的数量

连续更新条目中的值

Oracle SQL 查询对连续记录进行分组

统计学离散型变量和连续型变量有啥区别?

SQL按整数值的连续范围分组

按连续日期分组,忽略 SQL 中的周末