Hive之累计报表生成
Posted zhangchenchuan
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive之累计报表生成相关的知识,希望对你有一定的参考价值。
Hive之累计报表生成
1.原始数据
u01 2019/1/21 5
u02 2019/1/23 6
u03 2019/1/22 8
u04 2019/1/20 3
u01 2019/1/23 6
u01 2019/2/21 8
u02 2019/1/23 6
u01 2019/2/22 4
2.建表映射上述数据
create table action (userId string, visitDate string, visitCount int) row format delimited fields terminated by " ";
3.按照用户和月份分组生成某用户的当月总访问次数
create table action_amount
as
select tmp.userid,tmp.month,sum(tmp.visitcount) amount from (select userid,from_unixtime(unix_timestamp(visitdate,‘yyyy/mm/dd‘),‘yyyy-mm‘) month,visitcount from action) tmp group by tmp.userid,tmp.month;
4. 通过两个表的自连接,建立临时表
create table action_tmp
as
select a.amount as a_amount,b.*
from action_amount a join action_amount b on a.userid=b.userid
where a.month <= b.month;
5. 将上述表按照userid和month分组
select userid,month,max(amount) as amount,sum(a_amount) as accumulate
from action_tmp
group by userid,month;
以上是关于Hive之累计报表生成的主要内容,如果未能解决你的问题,请参考以下文章