间隔连续问题
Posted 一只懒得睁眼的猫
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了间隔连续问题相关的知识,希望对你有一定的参考价值。
输入数据:
CREATE TABLE lianxu
(
uid int,
dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘|’
LINES TERMINATED BY ‘\\n’;
id dt
1001|2021-12-12
1002|2021-12-12
1001|2021-12-13
1001|2021-12-14
1001|2021-12-16
1002|2021-12-16
1001|2021-12-19
1002|2021-12-17
1001|2021-12-20
代码:
思路:分组
1)将上一行时间数据下移
select
uid,
dt,
lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
from lianxu; t1
2)将当前行时间减去上一行时间数据
select
uid,
dt,
datediff(dt,lag_dt) as diff_date
from (
select
uid,
dt,
lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
from lianxu
)t1; t2
uid dt diff_date
1001 2021-12-12 18973
1001 2021-12-13 1
1001 2021-12-14 1
1001 2021-12-16 2
1001 2021-12-19 3
1001 2021-12-20 1
1002 2021-12-12 18973
1002 2021-12-16 4
1002 2021-12-17 1
3)按照用户分组,同时按照时间排序,计算从第一行到当前行大于2的数据的总条数(sum(if(flag>2,1,0)))
select
uid,
dt,
diff_date,
sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
from (
select
uid,
dt,
datediff(dt,lag_dt) as diff_date
from (
select
uid,
dt,
lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
from lianxu
)t1
)t2; t3
uid dt diff_date groupid
1001 2021-12-12 18973 1
1001 2021-12-13 1 1
1001 2021-12-14 1 1
1001 2021-12-16 2 1
1001 2021-12-19 3 2
1001 2021-12-20 1 2
1002 2021-12-12 18973 1
1002 2021-12-16 4 2
1002 2021-12-17 1 2
4)按照用户和flag分组,求最大时间减去最小时间并加上1
select
uid,
groupid,
datediff(max(dt),min(dt)) + 1 as days
from (
select
uid,
dt,
diff_date,
sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
from (
select
uid,
dt,
datediff(dt,lag_dt) as diff_date
from (
select
uid,
dt,
lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
from lianxu
)t1
)t2
)t3
group by uid,groupid; t4
uid groupid days
1001 1 5
1001 2 2
1002 1 1
1002 2 2
5)取连续登录天数的最大值
select
uid,
max(days) as max_dt
from (
select
uid,
groupid,
datediff(max(dt),min(dt)) + 1 as days
from (
select
uid,
dt,
diff_date,
sum(if (diff_date > 2,1,0)) over (distribute by uid sort by dt asc) as groupid
from (
select
uid,
dt,
datediff(dt,lag_dt) as diff_date
from (
select
uid,
dt,
lag(dt,1,‘1970-01-01’) over (distribute by uid sort by dt asc) as lag_dt
from lianxu
)t1
)t2
)t3
group by uid,groupid
)t4
group by uid;
结果:
OK
uid max_dt
1001 5
1002 2
Time taken: 3.842 seconds, Fetched: 2 row(s)
以上是关于间隔连续问题的主要内容,如果未能解决你的问题,请参考以下文章