rank()dens_rank() row_number()区别
Posted 逃跑的沙丁鱼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了rank()dens_rank() row_number()区别相关的知识,希望对你有一定的参考价值。
目录
1 建测试表
CREATE TABLE `user_login`(
`brandid` int,
`userid` string,
`logindate` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS ORC;
2 造数据
insert into user_login (brandid,userid,logindate) VALUES
(429,'1001','2021-05-01'),
(429,'1001','2021-05-02'),
(429,'1001','2021-05-02'),
(429,'1001','2021-05-03'),
(429,'1002','2021-05-01'),
(429,'1002','2021-05-02'),
(429,'1002','2021-05-03'),
(429,'1002','2021-05-04'),
(429,'1002','2021-05-06')
;
一共9条数据
3 rank()
partition by xx order by xx
默认asc 排序
select userid,logindate, rank() over(partition by userid order by logindate) rn from user_login;
可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用rank()排序 有并列:编号是1,2,2,4
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-03 4
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
也可以倒叙
select userid,logindate, rank() over(partition by userid order by logindate desc) rn from user_login;
userid logindate rn
1001 2021-05-03 1
1001 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-01 4
1002 2021-05-06 1
1002 2021-05-04 2
1002 2021-05-03 3
1002 2021-05-02 4
1002 2021-05-01 5
不指定partition排序
select userid,logindate, rank() over(order by logindate desc) rn from user_login;
userid logindate rn
1002 2021-05-06 1
1002 2021-05-04 2
1002 2021-05-03 3
1001 2021-05-03 3
1002 2021-05-02 5
1001 2021-05-02 5
1001 2021-05-02 5
1002 2021-05-01 8
1001 2021-05-01 8
distribute by xx sort by xx
升序
select userid,logindate, rank() over(distribute by userid sort by logindate asc) rn from user_login;
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-03 4
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
倒叙
select userid,logindate, rank() over(distribute by userid sort by logindate desc) rn from user_login;
不指定distribute by 排序
select userid,logindate, rank() over(sort by logindate desc) rn from user_login;
userid logindate rn
1002 2021-05-06 1
1002 2021-05-04 2
1002 2021-05-03 3
1001 2021-05-03 3
1002 2021-05-02 5
1001 2021-05-02 5
1001 2021-05-02 5
1002 2021-05-01 8
1001 2021-05-01 8
4 dense_rank()
partition by xx order by xx
select userid,logindate, dense_rank() over(partition by userid order by logindate asc) rn from user_login;
结果,
可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用dense_rank() 排序 有并列:编号是1,2,2,3
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-03 3
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
不指定partition排序
select userid,logindate, dense_rank() over(order by logindate asc) rn from user_login;
1002 2021-05-01 1
1001 2021-05-01 1
1002 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-02 2
1002 2021-05-03 3
1001 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
distribute by xx sort by xx
select userid,logindate, dense_rank() over(distribute by userid sort by logindate asc) rn from user_login;
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-03 3
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
不指定distribute by 排序
select userid,logindate, dense_rank() over(sort by logindate asc) rn from user_login;
userid logindate rn
1002 2021-05-01 1
1001 2021-05-01 1
1002 2021-05-02 2
1001 2021-05-02 2
1001 2021-05-02 2
1002 2021-05-03 3
1001 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
5 row_number()
partition by xx order by xx
select userid,logindate, row_number() over(partition by userid order by logindate asc) rn from user_login;
结果,
可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用row_number()排序 没有有并列:编号是1,2,3,4
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 3
1001 2021-05-03 4
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
不指定partition排序
select userid,logindate, row_number() over(order by logindate asc) rn from user_login;
userid logindate rn
1002 2021-05-01 1
1001 2021-05-01 2
1002 2021-05-02 3
1001 2021-05-02 4
1001 2021-05-02 5
1002 2021-05-03 6
1001 2021-05-03 7
1002 2021-05-04 8
1002 2021-05-06 9
distribute by xx sort by xx
select userid,logindate, row_number() over(distribute by userid sort by logindate asc) rn from user_login;
userid logindate rn
1001 2021-05-01 1
1001 2021-05-02 2
1001 2021-05-02 3
1001 2021-05-03 4
1002 2021-05-01 1
1002 2021-05-02 2
1002 2021-05-03 3
1002 2021-05-04 4
1002 2021-05-06 5
不指定distribute by 排序
select userid,logindate, row_number() over(sort by logindate asc) rn from user_login;
userid logindate rn
1002 2021-05-01 1
1001 2021-05-01 2
1002 2021-05-02 3
1001 2021-05-02 4
1001 2021-05-02 5
1002 2021-05-03 6
1001 2021-05-03 7
1002 2021-05-04 8
1002 2021-05-06 9
6 总结
① 没有并列的情况下没有任何区别
② 有并列的情况下
比如有4个人第二和第三并列的情况下
rank(): 1,2,2,4
dense_rank(): 1,2,2,3
row_number(): 1,2,3,4
③ 所有的方法over()括号里
都可以使用
partition by xx order by xx
distribute by xx sort by xx
以上是关于rank()dens_rank() row_number()区别的主要内容,如果未能解决你的问题,请参考以下文章
Oracle:row_number()rank()dense_rank()
我可以在不使用 ROW_NUM () OVER (ORDER BY xxxxx) 的情况下在 Sql Server 中对查询进行分页吗?