Hive(十八)--全局排序

Posted 2022-12-06 默一鸣

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hive(十八)--全局排序相关的知识，希望对你有一定的参考价值。

不分发数据，使用单个reducer

set mapred.reduce.tasks=1;

select * 
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
order by stime
limit 30000;

包多一层，是用order by

select t.* from 
(
select *
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
and app_id='16099'
and msgtype = 'role.recharge' 
) t
order by t.stime 
limit 5000;

把所有具有相同的行最终都在一个reducer分区中，在在一个reducer中排序。 cluster by column=distribute by column+sort by colum

select * 
from dw.dw_app 
where 
dt>='2016-09-01' 
and dt <='2016-09-18' 
and app_id='16099'
and msgtype = 'role.recharge' 
cluster by dt
limit 30000;

查询每天前十名充值用户和充值总额

select t3.*
  from (select t2.*
          from (select dt,
                       account_id,
                       sum(recharge_money) as total_money,
                       row_number() over(partition by dt order by sum(recharge_money) desc) rank
                  from (select dt, account_id, recharge_money
                          from dw.dw_app
                         where dt >= '2016-09-01'
                           and dt <= '2016-09-18'
                           and app_id = '16099'
                           and msgtype = 'role.recharge' 
　　　　　　　　　　　　　　　　cluster by dt, account_id) t
                 group by dt, account_id) t2
         where t2.rank <= 10) t3
 order by t3.dt asc, rank asc limit 300;

以上是关于Hive(十八)--全局排序的主要内容，如果未能解决你的问题，请参考以下文章