SQL - 根据列值重新启动分区

Posted

技术标签:

【中文标题】SQL - 根据列值重新启动分区【英文标题】:SQL - Partition restarted based on a column value 【发布时间】:2020-08-18 19:07:32 【问题描述】:

我需要创建一个新列,该列在每个Repeated Call 的每个Repeated Call 的每个0 值处重新启动Customer_ID

+-------------+---------+----------------------+---------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call |
+-------------+---------+----------------------+---------------+
|           1 |       1 | Null                 |             0 |
|           1 |       2 | 45                   |             0 |
|           1 |       3 | 0                    |             1 |
|           1 |       4 | 0                    |             1 |
|           1 |       5 | 0                    |             1 |
|           1 |       6 | 48                   |             0 |
|           1 |       7 | 1                    |             1 |
|           2 |       8 | Null                 |             0 |
|           2 |       9 | 1                    |             1 |
+-------------+---------+----------------------+---------------+

变成这样:

+-------------+---------+----------------------+---------------+-------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call | Order_Group |
+-------------+---------+----------------------+---------------+-------------+
|           1 |       1 | Null                 |             0 |           1 |
|           1 |       2 | 45                   |             0 |           2 |
|           1 |       3 | 0                    |             1 |           2 |
|           1 |       4 | 0                    |             1 |           2 |
|           1 |       5 | 0                    |             1 |           2 |
|           1 |       6 | 48                   |             0 |           3 |
|           1 |       7 | 1                    |             1 |           3 |
|           2 |       8 | Null                 |             0 |           1 |
|           2 |       9 | 1                    |             1 |           1 |
+-------------+---------+----------------------+---------------+-------------+

感谢您的建议,谢谢!

【问题讨论】:

【参考方案1】:

您可以使用 SUM() 窗口函数:

select t.*,
  sum(case when Repeated_Call = 0 then 1 else 0 end) 
  over (partition by Customer_ID order by Call_Id) Order_Group 
from tablename t

请参阅demo(适用于 mysql,但它是标准 SQL)。 结果:

| Customer_ID | Call_ID | Days Since Last Call | Repeated_Call | Order_Group |
| ----------- | ------- | -------------------- | ------------- | ----------- |
| 1           | 1       |                      | 0             | 1           |
| 1           | 2       | 45                   | 0             | 2           |
| 1           | 3       | 0                    | 1             | 2           |
| 1           | 4       | 0                    | 1             | 2           |
| 1           | 5       | 0                    | 1             | 2           |
| 1           | 6       | 48                   | 0             | 3           |
| 1           | 7       | 1                    | 1             | 3           |
| 2           | 8       |                      | 0             | 1           |
| 2           | 9       | 1                    | 1             | 1           |

【讨论】:

【参考方案2】:

您可以使用窗口分析函数 COUNT 和 ROWS UNBOUNDED PRECEDING 计算列重复呼叫(针对每个客户)中的每个 0 值:

SELECT *, 
COUNT(CASE WHEN  Repeated Call=0 THEN 1 ELSE NULL END )OVER(PARTITION BY Customer_ID 
ORDER BY Call_ID ROWS UNBOUNDED PRECEDING)Order_Gr FROM Table

【讨论】:

以上是关于SQL - 根据列值重新启动分区的主要内容,如果未能解决你的问题,请参考以下文章

重新启动 T-SQL 中的行号 [重复]

oracle sql中根据其他表中的计数重新启动rownumber

让 PySpark 每列值输出一个文件(重新分区/分区不工作)

VMware使用物理磁盘gpt分区安装虚拟机,重新启动报错

Redshift Row_Number() 查询重新启动的分区

Mac OS X磁盘重新分区后 BootCamp Windows启动项丢失