Hive 窗口函数
Posted noyouth
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive 窗口函数相关的知识,希望对你有一定的参考价值。
表数据如下
select * from business; business.name business.orderdate business.cost jack 2017-01-01 10 tony 2017-01-02 15 jack 2017-02-03 23 tony 2017-01-04 29 jack 2017-01-05 46 jack 2017-04-06 42 tony 2017-01-07 50 jack 2017-01-08 55 mart 2017-04-08 62 mart 2017-04-09 68 neil 2017-05-10 12 mart 2017-04-11 75 neil 2017-06-12 80 mart 2017-04-13 94
一、聚合函数开窗
1.全局范围
查询顾客姓名及总人数(多次购买只算一人)
select name,count(*) over() from business group by name; name count_window_0 jack 4 mart 4 neil 4 tony 4
2.排序后的范围
对所有人的消费明细,将 cost 按照日期进行累加
select *,sum(cost) over(order by orderdate) from business; business.name business.orderdate business.cost sum_window_0 jack 2017-01-01 10 10 tony 2017-01-02 15 25 tony 2017-01-04 29 54 jack 2017-01-05 46 100 tony 2017-01-07 50 150 jack 2017-01-08 55 205 jack 2017-02-03 23 228 jack 2017-04-06 42 270 mart 2017-04-08 62 332 mart 2017-04-09 68 400 mart 2017-04-11 75 475 mart 2017-04-13 94 569 neil 2017-05-10 12 581 neil 2017-06-12 80 661
3.分区加排序后的范围。分区排序搭配:distribute by + sort by或者partition by + order by。
将不同人的消费明细,按日期累加
select *,sum(cost) over(distribute by name sort by orderdate) from business; business.name business.orderdate business.cost sum_window_0 jack 2017-01-01 10 10 jack 2017-01-05 46 56 jack 2017-01-08 55 111 jack 2017-02-03 23 134 jack 2017-04-06 42 176 mart 2017-04-08 62 62 mart 2017-04-09 68 130 mart 2017-04-11 75 205 mart 2017-04-13 94 299 neil 2017-05-10 12 12 neil 2017-06-12 80 92 tony 2017-01-02 15 15 tony 2017-01-04 29 44 tony 2017-01-07 50 94
4.其他范围
①开始到当前行
over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row )
②当前行到最后
over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING)
③前一行、当前行和下一行
over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING)
二、其他函数开窗
1.lag()函数
查询每个顾客的购买明细及上一次购买时间
select *,lag(orderdate,1,‘1970-01-01‘) over(distribute by name sort by orderdate) from business; business.name business.orderdate business.cost lag_window_0 jack 2017-01-01 10 1970-01-01 jack 2017-01-05 46 2017-01-01 jack 2017-01-08 55 2017-01-05 jack 2017-02-03 23 2017-01-08 jack 2017-04-06 42 2017-02-03 mart 2017-04-08 62 1970-01-01 mart 2017-04-09 68 2017-04-08 mart 2017-04-11 75 2017-04-09 mart 2017-04-13 94 2017-04-11 neil 2017-05-10 12 1970-01-01 neil 2017-06-12 80 2017-05-10 tony 2017-01-02 15 1970-01-01 tony 2017-01-04 29 2017-01-02 tony 2017-01-07 50 2017-01-04
2.lead()函数
查询每个顾客的购买明细及下一次购买时间
select *,lead(orderdate,1,‘9999-99-99‘) over(distribute by name sort by orderdate) from business; jack 2017-01-01 10 2017-01-05 jack 2017-01-05 46 2017-01-08 jack 2017-01-08 55 2017-02-03 jack 2017-02-03 23 2017-04-06 jack 2017-04-06 42 9999-99-99 mart 2017-04-08 62 2017-04-09 mart 2017-04-09 68 2017-04-11 mart 2017-04-11 75 2017-04-13 mart 2017-04-13 94 9999-99-99 neil 2017-05-10 12 2017-06-12 neil 2017-06-12 80 9999-99-99 tony 2017-01-02 15 2017-01-04 tony 2017-01-04 29 2017-01-07 tony 2017-01-07 50 9999-99-99
3.ntile()函数
查询前20%时间的订单信息
select * from (select *,ntile(5) over(order by orderdate) nt from business) t1 where t1.nt = 1; t1.name t1.orderdate t1.cost t1.nt jack 2017-01-01 10 1 tony 2017-01-02 15 1 tony 2017-01-04 29 1
以上是关于Hive 窗口函数的主要内容,如果未能解决你的问题,请参考以下文章