SUM累计窗口统计的两种实现方法示例

Posted ShenLiang2025

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了SUM累计窗口统计的两种实现方法示例相关的知识,希望对你有一定的参考价值。

SUM累计窗口函数的使用案例

需求说明

请查询出借款时间在近七天内的借款用户中,每天借款时已还清借款3次以上的用户数。查询宇段应包含: 借款日期,借款用户数。

 解决方法

WITH CTE AS(
SELECT 1 id,2343432 AS uid,'2020-09-01' load_date,'20d' load_periods,1 load_status,1 replay_status, NULL payoff_date  UNION ALL
SELECT 2 id,12312133 AS uid,'2020-09-02' load_date,'30d' load_periods,2 load_status,2 replay_status,'2020-10-02' payoff_date UNION ALL
SELECT 3 id,2343431 AS uid,'2020-09-03' load_date,'15d' load_periods,2 load_status,2 replay_status, '2020-09-18' payoff_date  UNION ALL
SELECT 4 id,2343431 AS uid,'2020-09-04' load_date,'20d' load_periods,2 load_status,1 replay_status, NULL payoff_date  UNION ALL
SELECT 5 id,2343432 AS uid,'2022-09-01' load_date,'20d' load_periods,2 load_status,2 replay_status, '2022-09-19' payoff_date  UNION ALL
SELECT 6 id,12312133 AS uid,'2022-09-02' load_date,'18d' load_periods,2 load_status,2 replay_status,'2022-09-17' payoff_date UNION ALL
SELECT 7 id,2343431 AS uid,'2022-09-14' load_date,'15d' load_periods,2 load_status,2 replay_status, '2022-09-29' payoff_date  UNION ALL
SELECT 8 id,2343431 AS uid,'2022-09-04' load_date,'20d' load_periods,2 load_status,1 replay_status, NULL payoff_date UNION ALL
SELECT 9 id,2343432 AS uid,'2022-11-06' load_date,'20d' load_periods,2 load_status,2 replay_status, '2022-11-26' payoff_date  UNION ALL
SELECT 10 id,12312133 AS uid,'2022-11-02' load_date,'14d' load_periods,2 load_status,2 replay_status,'2022-11-13' payoff_date UNION ALL
SELECT 11 id,2343431 AS uid,'2022-11-05' load_date,'20d' load_periods,2 load_status,2 replay_status, '2022-11-25' payoff_date  UNION ALL
SELECT 12 id,2343431 AS uid,'2023-11-04' load_date,'20d' load_periods,2 load_status,1 replay_status, NULL payoff_date UNION ALL
SELECT 13 id,9876 AS uid,'2022-10-06' load_date,'20d' load_periods,2 load_status,2 replay_status, '2022-10-21' payoff_date UNION ALL
SELECT 14 id,9876 AS uid,'2023-10-04' load_date,'14d' load_periods,2 load_status,2 replay_status, '2023-10-14' payoff_date UNION ALL
SELECT 15 id,2343432 AS uid,'2023-01-31' load_date,'15d' load_periods,2 load_status,1 replay_status, NULL payoff_date  UNION ALL
SELECT 16 id,12312133 AS uid,'2023-01-31' load_date,'14d' load_periods,2 load_status,1 replay_status, NULL payoff_date UNION ALL
SELECT 17 id,2343431 AS uid,'2023-02-01' load_date,'20d' load_periods,2 load_status,1 replay_status, NULL payoff_date   UNION ALL
SELECT 18 id,12312133 AS uid,'2023-02-01' load_date,'12d' load_periods,2 load_status,1 replay_status, NULL payoff_date UNION ALL
SELECT 19 id,2343432 AS uid,'2023-02-01' load_date,'15d' load_periods,2 load_status,2 replay_status, '2023-02-02' payoff_date UNION ALL
SELECT 20 id,2343432 AS uid,'2023-02-03' load_date,'15d' load_periods,2 load_status,1 replay_status, NULL payoff_date UNION ALL
SELECT 21 id,9876 AS uid,'2023-02-03' load_date,'15d' load_periods,2 load_status,1 replay_status, NULL payoff_dat
    )

-- #方法1:用窗口函数,注意累加时当前的行不参与SUM
SELECT load_date,COUNT(uid) repaly3timesUser_cnt,date_gap,DATE_FORMAT(now(),'%Y-%m-%d') current FROM
    (
        SELECT DATEDIFF(now(), load_date)  date_gap,
               -- SUM(CASE WHEN replay_status = '2' THEN 1 ELSE NULL END) OVER (PARTITION BY uid ORDER BY load_date rows between unbounded preceding and 1 following) repalyCurr_cnt,
               SUM(CASE WHEN replay_status = '2' THEN 1 ELSE NULL END)
                   OVER (PARTITION BY uid ORDER BY load_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) repalyBeforeCurr_cnt,
               load_date,uid
        FROM CTE
        -- WHERE uid = '2343432' #验证用户
    )A
WHERE date_gap<=7 AND date_gap>=0 AND repalyBeforeCurr_cnt>=3
GROUP BY load_date
-- #方法2:用临时表,先取出近7天用户数据,再跟原表关联统计历史还清贷款数,最终再汇总还了3笔以上的用户数。(注释掉方法1的代码再执行)

,last7day AS (
    SELECT load_date, uid
    FROM CTE
    WHERE DATEDIFF(now(), load_date) <= 7
      AND DATEDIFF(now(), load_date) >= 0
),repalyDetail AS (
    SELECT SUM(CASE WHEN replay_status = '2' THEN 1 ELSE NULL END) repaly_cnt, B.uid, B.load_date
    FROM CTE A
             JOIN last7day B
                  ON A.uid = B.uid AND DATEDIFF(B.load_date, A.load_date) > 0
    GROUP BY B.uid, B.load_date
)
SELECT load_date,COUNT(uid) repaly3timesUser_cnt,DATEDIFF(now(), load_date) date_gap,DATE_FORMAT(now(),'%Y-%m-%d') current
FROM repalyDetail
WHERE repaly_cnt >=3
GROUP BY load_date

以上是关于SUM累计窗口统计的两种实现方法示例的主要内容,如果未能解决你的问题,请参考以下文章

Flink应用案例统计实现TopN的两种方式

Flink应用案例统计实现TopN的两种方式

Hive分析窗口(开窗函数)

Hive分析窗口(开窗函数)

sum over函数

Mysql时间数据分段累加求和案例之子查询与SUM窗口函数