SQL 获取组号
Posted
技术标签:
【中文标题】SQL 获取组号【英文标题】:SQL to get group number 【发布时间】:2020-06-24 01:39:46 【问题描述】:我有一个示例数据如下。
+---------+------------+--------+-----------+
| User Id | Sequence | Action | Object |
|---------|------------|--------|-----------|
| 12345 | 1 | Eat | Bread |
| 12345 | 2 | Eat | Steak |
| 12345 | 3 | Eat | Bread |
| 12345 | 4 | Drink | Milk tea |
| 12345 | 5 | Drink | Black tea |
| 12345 | 6 | Eat | Cake |
| 12345 | 7 | Eat | Candy |
| 12345 | 8 | Drink | Black tea |
| 12345 | 9 | Drink | Green tea |
| 12345 | 10 | Drink | Water |
+---------+------------+--------+-----------+
现在我想在表中添加一个名为“Group Id”的列,结果应该是这样的:
+---------+------------+--------+-----------+-----------+
| User Id | Sequence | Action | Object | Group Id. |
|---------|------------|--------|-----------|-----------|
| 12345 | 1 | Eat | Bread | 1 |
| 12345 | 2 | Eat | Steak | 1 |
| 12345 | 3 | Eat | Bread | 1 |
| 12345 | 4 | Drink | Milk tea | 2 |
| 12345 | 5 | Drink | Black tea | 2 |
| 12345 | 6 | Eat | Cake | 3 |
| 12345 | 7 | Eat | Candy | 3 |
| 12345 | 8 | Drink | Black tea | 4 |
| 12345 | 9 | Drink | Green tea | 4 |
| 12345 | 10 | Drink | Water | 4 |
+---------+------------+--------+-----------+-----------|
同一个动作应该被分成一个组,但会以不同的顺序分开。 如何实现 SQL(我使用 Google Bigquery)?
谢谢一百万!
【问题讨论】:
【参考方案1】:以下是 BigQuery 标准 SQL
#standardSQL
SELECT * EXCEPT(new_group),
COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
SELECT *,
Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
FROM `project.dataset.table`
)
-- ORDER BY User_Id
如果适用于您问题的样本数据 - 输出是
Row User_Id Sequence Action Object Group_Id
1 12345 1 Eat Bread 1
2 12345 2 Eat Steak 1
3 12345 3 Eat Bread 1
4 12345 4 Drink Milk tea 2
5 12345 5 Drink Black tea 2
6 12345 6 Eat Cake 3
7 12345 7 Eat Candy 3
8 12345 8 Drink Black tea 4
9 12345 9 Drink Green tea 4
10 12345 10 Drink Water 4
【讨论】:
天哪,你太棒了!【参考方案2】:这是一种孤岛问题。一个简单的方法是使用lag()
来确定发生变化的位置而不是累积和:
select t.*,
1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
lag(action) over (order by sequence) as prev_action
from t
) t;
您也可以使用countif()
来表达外部逻辑:
1 + countif( prev_action <> acction ) over (order by sequence) as group_id
【讨论】:
@LucasLee。 . .我想成为第一没有任何优势。以上是关于SQL 获取组号的主要内容,如果未能解决你的问题,请参考以下文章