SQL 获取组号

Posted

技术标签:

【中文标题】SQL 获取组号【英文标题】:SQL to get group number 【发布时间】:2020-06-24 01:39:46 【问题描述】:

我有一个示例数据如下。

    +---------+------------+--------+-----------+
    | User Id |   Sequence | Action | Object    |
    |---------|------------|--------|-----------|
    | 12345   |    1       | Eat    | Bread     |
    | 12345   |    2       | Eat    | Steak     |
    | 12345   |    3       | Eat    | Bread     |
    | 12345   |    4       | Drink  | Milk tea  |
    | 12345   |    5       | Drink  | Black tea |  
    | 12345   |    6       | Eat    | Cake      |
    | 12345   |    7       | Eat    | Candy     |
    | 12345   |    8       | Drink  | Black tea | 
    | 12345   |    9       | Drink  | Green tea | 
    | 12345   |    10      | Drink  | Water     |
    +---------+------------+--------+-----------+

现在我想在表中添加一个名为“Group Id”的列,结果应该是这样的:

    +---------+------------+--------+-----------+-----------+
    | User Id |   Sequence | Action | Object    | Group Id. |
    |---------|------------|--------|-----------|-----------|
    | 12345   |    1       | Eat    | Bread     |     1     |
    | 12345   |    2       | Eat    | Steak     |     1     |
    | 12345   |    3       | Eat    | Bread     |     1     |
    | 12345   |    4       | Drink  | Milk tea  |     2     |
    | 12345   |    5       | Drink  | Black tea |     2     |
    | 12345   |    6       | Eat    | Cake      |     3     |
    | 12345   |    7       | Eat    | Candy     |     3     |
    | 12345   |    8       | Drink  | Black tea |     4     |
    | 12345   |    9       | Drink  | Green tea |     4     |
    | 12345   |    10      | Drink  | Water     |     4     |
    +---------+------------+--------+-----------+-----------|

同一个动作应该被分成一个组,但会以不同的顺序分开。 如何实现 SQL(我使用 Google Bigquery)?

谢谢一百万!

【问题讨论】:

【参考方案1】:

以下是 BigQuery 标准 SQL

#standardSQL
SELECT * EXCEPT(new_group),
  COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
  SELECT *,
    Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
  FROM `project.dataset.table`
)
-- ORDER BY User_Id     

如果适用于您问题的样本数据 - 输出是

Row User_Id Sequence    Action  Object      Group_Id     
1   12345   1           Eat     Bread       1    
2   12345   2           Eat     Steak       1    
3   12345   3           Eat     Bread       1    
4   12345   4           Drink   Milk tea    2    
5   12345   5           Drink   Black tea   2    
6   12345   6           Eat     Cake        3    
7   12345   7           Eat     Candy       3    
8   12345   8           Drink   Black tea   4    
9   12345   9           Drink   Green tea   4    
10  12345   10          Drink   Water       4    

【讨论】:

天哪,你太棒了!【参考方案2】:

这是一种孤岛问题。一个简单的方法是使用lag() 来确定发生变化的位置而不是累积和:

select t.*,
       1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
             lag(action) over (order by sequence) as prev_action
      from t
     ) t;

您也可以使用countif() 来表达外部逻辑:

1 + countif( prev_action <> acction ) over (order by sequence) as group_id

【讨论】:

@LucasLee。 . .我想成为第一没有任何优势。

以上是关于SQL 获取组号的主要内容,如果未能解决你的问题,请参考以下文章

如何编写sql查询为每个分组记录生成一个组号

当列值更改时,如何将组号添加到 SQL Server 2012 中的顺序记录?

如何在SQL中返回每个组的增量组编号

连续显示表格数据

如何在记事本++替换中分隔正则表达式组号?

为每个连续序列创建一个组号