SQL - 根据列值重新启动分区
Posted
技术标签:
【中文标题】SQL - 根据列值重新启动分区【英文标题】:SQL - Partition restarted based on a column value 【发布时间】:2020-08-18 19:07:32 【问题描述】:我需要创建一个新列,该列在每个Repeated Call
的每个Repeated Call
的每个0 值处重新启动Customer_ID
:
+-------------+---------+----------------------+---------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call |
+-------------+---------+----------------------+---------------+
| 1 | 1 | Null | 0 |
| 1 | 2 | 45 | 0 |
| 1 | 3 | 0 | 1 |
| 1 | 4 | 0 | 1 |
| 1 | 5 | 0 | 1 |
| 1 | 6 | 48 | 0 |
| 1 | 7 | 1 | 1 |
| 2 | 8 | Null | 0 |
| 2 | 9 | 1 | 1 |
+-------------+---------+----------------------+---------------+
变成这样:
+-------------+---------+----------------------+---------------+-------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call | Order_Group |
+-------------+---------+----------------------+---------------+-------------+
| 1 | 1 | Null | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | Null | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |
+-------------+---------+----------------------+---------------+-------------+
感谢您的建议,谢谢!
【问题讨论】:
【参考方案1】:您可以使用 SUM() 窗口函数:
select t.*,
sum(case when Repeated_Call = 0 then 1 else 0 end)
over (partition by Customer_ID order by Call_Id) Order_Group
from tablename t
请参阅demo(适用于 mysql,但它是标准 SQL)。 结果:
| Customer_ID | Call_ID | Days Since Last Call | Repeated_Call | Order_Group |
| ----------- | ------- | -------------------- | ------------- | ----------- |
| 1 | 1 | | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |
【讨论】:
【参考方案2】:您可以使用窗口分析函数 COUNT 和 ROWS UNBOUNDED PRECEDING 计算列重复呼叫(针对每个客户)中的每个 0 值:
SELECT *,
COUNT(CASE WHEN Repeated Call=0 THEN 1 ELSE NULL END )OVER(PARTITION BY Customer_ID
ORDER BY Call_ID ROWS UNBOUNDED PRECEDING)Order_Gr FROM Table
【讨论】:
以上是关于SQL - 根据列值重新启动分区的主要内容,如果未能解决你的问题,请参考以下文章
oracle sql中根据其他表中的计数重新启动rownumber
让 PySpark 每列值输出一个文件(重新分区/分区不工作)