rank()dens_rank() row_number()区别

Posted 逃跑的沙丁鱼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了rank()dens_rank() row_number()区别相关的知识,希望对你有一定的参考价值。

目录

 

1 建测试表

2 造数据

3 rank()

4 dense_rank()

5 row_number()

6 总结


1 建测试表

CREATE TABLE `user_login`(
  `brandid` int, 
  `userid` string, 
  `logindate` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS ORC;

2 造数据

insert into user_login (brandid,userid,logindate) VALUES 
(429,'1001','2021-05-01'),
(429,'1001','2021-05-02'),
(429,'1001','2021-05-02'),
(429,'1001','2021-05-03'),
(429,'1002','2021-05-01'),
(429,'1002','2021-05-02'),
(429,'1002','2021-05-03'),
(429,'1002','2021-05-04'),
(429,'1002','2021-05-06')
;

一共9条数据

3 rank()

partition by xx  order by xx

默认asc 排序

select userid,logindate, rank() over(partition by userid order by logindate) rn from user_login;

可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用rank()排序 有并列:编号是1,2,2,4

userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-03      4
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

也可以倒叙

select userid,logindate, rank() over(partition by userid order by logindate desc) rn from user_login;
userid  logindate       rn
1001    2021-05-03      1
1001    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-01      4
1002    2021-05-06      1
1002    2021-05-04      2
1002    2021-05-03      3
1002    2021-05-02      4
1002    2021-05-01      5

不指定partition排序

select userid,logindate, rank() over(order by logindate desc) rn from user_login;
userid  logindate       rn
1002    2021-05-06      1
1002    2021-05-04      2
1002    2021-05-03      3
1001    2021-05-03      3
1002    2021-05-02      5
1001    2021-05-02      5
1001    2021-05-02      5
1002    2021-05-01      8
1001    2021-05-01      8

distribute  by xx sort by xx

升序

select userid,logindate, rank() over(distribute  by userid sort by logindate asc) rn from user_login;
userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-03      4
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

倒叙

select userid,logindate, rank() over(distribute  by userid sort by logindate desc) rn from user_login;

不指定distribute  by 排序

select userid,logindate, rank() over(sort by logindate desc) rn from user_login;
userid  logindate       rn
1002    2021-05-06      1
1002    2021-05-04      2
1002    2021-05-03      3
1001    2021-05-03      3
1002    2021-05-02      5
1001    2021-05-02      5
1001    2021-05-02      5
1002    2021-05-01      8
1001    2021-05-01      8

4 dense_rank()

partition by xx  order by xx

select userid,logindate, dense_rank() over(partition by userid order by logindate asc) rn from user_login;

结果,

可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用dense_rank() 排序 有并列:编号是1,2,2,3

userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-03      3
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

不指定partition排序

select userid,logindate, dense_rank() over(order by logindate asc) rn from user_login;
1002    2021-05-01      1
1001    2021-05-01      1
1002    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-02      2
1002    2021-05-03      3
1001    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

distribute  by xx sort by xx

select userid,logindate, dense_rank() over(distribute  by userid sort by logindate asc) rn from user_login;
userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-03      3
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

不指定distribute  by 排序

select userid,logindate, dense_rank() over(sort by logindate asc) rn from user_login;
userid  logindate       rn
1002    2021-05-01      1
1001    2021-05-01      1
1002    2021-05-02      2
1001    2021-05-02      2
1001    2021-05-02      2
1002    2021-05-03      3
1001    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

5 row_number()

partition by xx  order by xx

select userid,logindate, row_number() over(partition by userid order by logindate asc) rn from user_login;

结果,

可以看到userid=1001 的人 2021-05-02 号连续访问了2次,所以使用row_number()排序 没有有并列:编号是1,2,3,4

userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      3
1001    2021-05-03      4
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

不指定partition排序

select userid,logindate, row_number() over(order by logindate asc) rn from user_login;
userid  logindate       rn
1002    2021-05-01      1
1001    2021-05-01      2
1002    2021-05-02      3
1001    2021-05-02      4
1001    2021-05-02      5
1002    2021-05-03      6
1001    2021-05-03      7
1002    2021-05-04      8
1002    2021-05-06      9

distribute  by xx sort by xx

select userid,logindate, row_number() over(distribute  by userid sort by logindate asc) rn from user_login;
userid  logindate       rn
1001    2021-05-01      1
1001    2021-05-02      2
1001    2021-05-02      3
1001    2021-05-03      4
1002    2021-05-01      1
1002    2021-05-02      2
1002    2021-05-03      3
1002    2021-05-04      4
1002    2021-05-06      5

不指定distribute  by 排序

select userid,logindate, row_number() over(sort by logindate asc) rn from user_login;
userid  logindate       rn
1002    2021-05-01      1
1001    2021-05-01      2
1002    2021-05-02      3
1001    2021-05-02      4
1001    2021-05-02      5
1002    2021-05-03      6
1001    2021-05-03      7
1002    2021-05-04      8
1002    2021-05-06      9

6 总结

没有并列的情况下没有任何区别

有并列的情况下

比如有4个人第二和第三并列的情况下

rank():         1,2,2,4

dense_rank():    1,2,2,3

row_number():    1,2,3,4

所有的方法over()括号里

都可以使用

partition by   xx  order by xx

distribute  by xx  sort by xx

 

 

 

 

以上是关于rank()dens_rank() row_number()区别的主要内容,如果未能解决你的问题,请参考以下文章

Oracle:row_number()rank()dense_rank()

MySQL窗口_分布、前后、头尾函数

SqlServer 语法

带有 Row_Num 的 CASE

从 ROW_NUM 中仅选择编号最大的行以获取最新更新

我可以在不使用 ROW_NUM () OVER (ORDER BY xxxxx) 的情况下在 Sql Server 中对查询进行分页吗?