hive row_number等窗口分析函数
Posted kimbo
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hive row_number等窗口分析函数相关的知识,希望对你有一定的参考价值。
一、排序&去重分析
row_number() over(partititon by col1 order by col2) as rn
结果:1,2,3,4
rank() over(partititon by col1 order by col2) as rk
结果:1,2,2,4,5
dense_rank() over(partititon by col1 order by col2) as ds_rk
结果:1,2,2,3,4
select order_id, departure_date, row_number() over(partition by order_id order by departure_date) as rn, -- 直排 rank() over(partition by order_id order by departure_date) as rk, -- 并列的,下一个数字会跳过 dense_rank() over(partition by order_id order by departure_date) as d_rk -- 并列的,下一个数据不会跳过 from ord_test where order_id=410341346 ;
运行结果:
二、跨行获取
lag(col1,n,DEFAULT) over(partition by col1 order by col2) as up
用于统计窗口内往上第n行值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)
lead(col1,n,DEFAULT) over(partition by col1 order by col2) as down
用于统计窗口内往下第n行值,第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)
first_value() over(partition by col1 order by col2) as fv
取分组内排序后,截止到当前行,第一个值
last_value() over(partition by col1 order by col2) as lv
取分组内排序后,截止到当前行,第一个值
select order_id, departure_date, first_value(departure_date) over(partition by order_id order by add_time)as fv, -- 取分组内第一条 last_value(departure_date) over(partition by order_id order by add_time)as lv -- 取分组内最后一条 from ord_test where order_id=410341346 ; select order_id, departure_date, lead(departure_date,1) over(partition by order_id order by departure_date)as down_1, -- 向下取一级 lag(departure_date,1) over(partition by order_id order by departure_date)as up_1 -- 向上取一级 from ord_test where order_id=410341346 ;
结果截图:
以上是关于hive row_number等窗口分析函数的主要内容,如果未能解决你的问题,请参考以下文章