Hive(十八)--全局排序

Posted 默一鸣

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive(十八)--全局排序相关的知识,希望对你有一定的参考价值。

不分发数据,使用单个reducer

set mapred.reduce.tasks=1;

select * 
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
order by stime
limit 30000;

 

包多一层,是用order by

select t.* from 
(
select *
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
and app_id='16099'
and msgtype = 'role.recharge' 
) t
order by t.stime 
limit 5000;

 

 

 

把所有具有相同的行最终都在一个reducer分区中,在在一个reducer中排序。 cluster by column=distribute by column+sort by colum

select * 
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
and app_id='16099'
and msgtype = 'role.recharge' 
cluster by dt
limit 30000;

 

查询每天前十名充值用户和充值总额

select t3.*
  from (select t2.*
          from (select dt,
                       account_id,
                       sum(recharge_money) as total_money,
                       row_number() over(partition by dt order by sum(recharge_money) desc) rank
                  from (select dt, account_id, recharge_money
                          from dw.dw_app
                         where dt >= '2016-09-01'
                           and dt <= '2016-09-18'
                           and app_id = '16099'
                           and msgtype = 'role.recharge' 
                cluster by dt, account_id) t
                 group by dt, account_id) t2
         where t2.rank <= 10) t3
 order by t3.dt asc, rank asc limit 300;

以上是关于Hive(十八)--全局排序的主要内容,如果未能解决你的问题,请参考以下文章

Hive学习之排序

Hive学习 排序:order bysort bydistribute bycluster by

Hive学习之路 (十八)Hive的Shell操作

Hive的排序

Hive : SORT BY vs ORDER BY vs DISTRIBUTE BY vs CLUSTER BY

Hive-窗口函数/开窗函数(重点理解~~~)