spark sql 连续登录最大天数

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark sql 连续登录最大天数相关的知识,希望对你有一定的参考价值。

参考技术A 数据:

注意:
| 3|2020-09-04|
| 3|2020-09-04|
这里是有重复的,所以
第一步是去重复:

第二步:
同一个user_id的登录时间进行排序

第三步:
用dt减去排名之后,如果时间是连续的,那么结果相同。

第四步:
对相同的时间进行求和:

第五步:
求最大连续天数:

整个sql:

SQL(之二)-查询“最长连续登录天数“

0-创建表及插入测试数据

说明:本文主要想分享,如何查询"最长连续登录天数",具体的测试数据,可以自行添加。

CREATE TABLE USER_LOGINFO(
USER_ID INTEGER ,
LOGIN_DATE DATE,
IS_SIGN_IN VARCHAR2(1) DEFAULT '1'
);
COMMENT ON TABLE USER_LOGINFO IS '用户登录信息表';
COMMENT ON COLUMN USER_LOGINFO.USER_ID IS '用户ID';
COMMENT ON COLUMN USER_LOGINFO.LOGIN_DATE IS '用户登录日期';
COMMENT ON COLUMN USER_LOGINFO.IS_SIGN_IN IS '用户是否签到';
--插入测试数据
INSERT INTO  USER_LOGINFO (USER_ID,LOGIN_DATE)
SELECT A.USER_ID, B.FORMAT  FROM (
SELECT  1 USER_ID FROM DUAL 
UNION ALL 
SELECT  2 USER_ID FROM DUAL 
UNION ALL 
SELECT  3 USER_ID FROM DUAL 
UNION ALL 
SELECT  4USER_ID FROM DUAL 
UNION ALL 
SELECT  5 USER_ID FROM DUAL 
)A,
( 
SELECT TO_DATE('2021-05-17 09:00:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL
UNION ALL
SELECT TO_DATE('2021-05-17 12:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-17 08:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-18 09:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-18 10:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-19 13:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-19 09:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-20 21:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-20 23:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-20 07:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-21 05:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-21 06:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-22 07:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-22 08:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
UNION ALL
SELECT TO_DATE('2021-05-23 09:05:00', 'yyyy-mm-dd hh24:mi:ss') AS FORMAT  FROM DUAL 
)B;
COMMIT ; 

DELETE FROM  USER_LOGINFO T WHERE T.USER_ID=2 AND TO_CHAR(T.LOGIN_DATE,'yyyy-mm-dd') IN('2021-05-17','2021-05-22','2021-05-23');
DELETE FROM  USER_LOGINFO T WHERE T.USER_ID=3 AND TO_CHAR(T.LOGIN_DATE,'yyyy-mm-dd') IN('2021-05-19','2021-05-21');
DELETE FROM  USER_LOGINFO T WHERE T.USER_ID=4 AND TO_CHAR(T.LOGIN_DATE,'yyyy-mm-dd') IN('2021-05-18','2021-05-22');
DELETE FROM  USER_LOGINFO T WHERE T.USER_ID=5 AND TO_CHAR(T.LOGIN_DATE,'yyyy-mm-dd') IN('2021-05-17','2021-05-19');
COMMIT ; 

--1A、查询用户的活跃天数

SELECT T.USER_ID 用户ID ,COUNT( DISTINCT TO_CHAR(LOGIN_DATE,'yyyymmdd')) 活跃天数 FROM  USER_LOGINFO T 
GROUP BY T.USER_ID  ORDER BY 1 ;

--1B、查询用户周活跃天数

SELECT T.USER_ID 用户ID ,COUNT( DISTINCT TO_CHAR(LOGIN_DATE,'yyyymmdd')) 周活跃天数 FROM  USER_LOGINFO T 
WHERE
TO_NUMBER(TO_CHAR(SYSDATE,'yyyymmdd'))-TO_NUMBER(TO_CHAR(T.LOGIN_DATE,'yyyymmdd'))BETWEEN 0 AND 7 
GROUP BY T.USER_ID  ORDER BY 1 ;

--2A 、查询用户的最大连续登录天数

SELECT C.USER_ID, MAX(C.CONTINUOUS_LOGIN_DAY) 最长连续登录天数
  FROM (SELECT B.USER_ID, B.LOGIN_DAY - RANK, COUNT(1) CONTINUOUS_LOGIN_DAY
          FROM (SELECT A.USER_ID,
                       A.LOGIN_DAY,
                       ROW_NUMBER() OVER(PARTITION BY A.USER_ID ORDER BY A.LOGIN_DAY) AS RANK --按照用户分组,用户登录日期去升序进行排序
                  FROM (SELECT DISTINCT T.USER_ID,
                                        TO_CHAR(T.LOGIN_DATE, 'yyyymmdd') LOGIN_DAY --去重 每个用户每天只保留一条记录
                          FROM USER_LOGINFO T) A) B
         GROUP BY B.USER_ID, B.LOGIN_DAY - RANK) C
 GROUP BY C.USER_ID
 ORDER BY 1;

--2B、查询用户一周内最大连续登录天数

SELECT C.USER_ID, MAX(C.CONTINUOUS_LOGIN_DAY) 最长连续登录天数
  FROM (SELECT B.USER_ID, B.LOGIN_DAY - RANK, COUNT(1) CONTINUOUS_LOGIN_DAY
          FROM (SELECT A.USER_ID,
                       A.LOGIN_DAY,
                       ROW_NUMBER() OVER(PARTITION BY A.USER_ID ORDER BY A.LOGIN_DAY) AS RANK --按照用户分组,用户登录日期去升序进行排序
                  FROM (SELECT DISTINCT T.USER_ID,
                                        TO_CHAR(T.LOGIN_DATE, 'yyyymmdd') LOGIN_DAY --去重 每个用户每天只保留一条记录
                          FROM USER_LOGINFO T
                         WHERE TO_NUMBER(TO_CHAR(SYSDATE, 'yyyymmdd')) -
                               TO_NUMBER(TO_CHAR(T.LOGIN_DATE, 'yyyymmdd')) BETWEEN 0 AND 7 --筛选登录日期在一周以内的记录
                        ) A) B
         GROUP BY B.USER_ID, B.LOGIN_DAY - RANK) C
 GROUP BY C.USER_ID
 ORDER BY 1;

 

以上是关于spark sql 连续登录最大天数的主要内容,如果未能解决你的问题,请参考以下文章

Hive计算最大连续登陆天数

查找超过指定天数的任何人的最大连续缺勤天数

hive之连续登录问题

hive实现用户连续登陆的最大天数

SQL 计算最长连续登录天数

SQL(之二)-查询“最长连续登录天数“