Hive--笔试题05_2--求TopN
Posted 中琦2513
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive--笔试题05_2--求TopN相关的知识,希望对你有一定的参考价值。
现在有这样一份数据:
1,huangxiaoming,45,a-c-d-f
2,huangzitao,36,b-c-d-e
3,huanglei,41,c-d-e
4,liushishi,22,a-d-e
5,liudehua,39,e-f-d
6,liuyifei,35,a-d-e
字段的意义:
id,name,age,favors
id,姓名,年龄,爱好
其中需要注意的是:每一条记录中的爱好有多个值,以"-"分隔
需求:
求出每种爱好中,年龄最大的两个人(爱好,年龄,姓名)
注意思考一个问题:如果某个爱好中的第二大年龄有多个相同的怎么办?
解题:
建表准备:
create database if not exists exercise;
use exercise;
drop table if exists exercise5;
create table exercise5(id int, name string, age int, favors string) row format delimited fields terminated by ",";
load data local inpath "/home/hadoop/exercise5.txt" into table exercise5;
select * from exercise5;
desc exercise5;
思路分析
需要把这种数据:
6,liuyifei,35,a-d-e
变成:
6,liuyifei,35,a
6,liuyifei,35,d
6,liuyifei,35,e
SQL实现的结果:
select explode(split("a-d-e", "-")); √√√√√√√√
select id,name,age, explode(split(fovors), "-") from exercise5; xxxxxxx
必须要借助于虚拟视图技术:
leteral view
改写:
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor;
求出每种爱好的最大的年龄的最终的SQL:
select aa.favor, max(aa.age) as maxage
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
group by aa.favor;
结果:
a 45
b 36
c 45
d 45
e 41
f 45
但是,如果需求扩展:
两个需求:
1、你如何帮我把这个年龄的姓名拿出来呢?
2、如果要去每种爱好中的前2名呢?
拓展一下:如果能给每一组中的每个人按照年龄降序排序,然后分配一个组内的序号。那么将来查询数据的时候根据序号去查询将变的非常容易
select aa.favor, aa.age
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa order by aa.favor, aa.age desc;
// 这是需求数据格式,但是上面的SQL语句实现不了
id age rank
a 45 1
a 35 2
a 22 3
b 36 1
c 45 1
c 41 2
c 36 3
d 45 1
d 41 2
d 39 3
d 36 4
d 35 5
d 22 6
e 41 1
e 39 2
e 36 3
e 35 4
e 22 5
f 45 1
f 39 2
如果有上面的数据了,那么要筛选出每种爱好中的年龄前2名的人的信息,就容易了。
select id, name from table where rank <= 2;
// 利用窗口分析函数,添加序号:
select aa.id, aa.name, aa.age, aa.favor,
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa ;
结果数据:
1 huangbo 45 a 1
6 liuyifei 35 a 2
4 liushishi 22 a 3
2 xuzheng 36 b 1
1 huangbo 45 c 1
3 huanglei 41 c 2
2 xuzheng 36 c 3
1 huangbo 45 d 1
3 huanglei 41 d 2
5 liudehua 39 d 3
2 xuzheng 36 d 4
6 liuyifei 35 d 5
4 liushishi 22 d 6
3 huanglei 41 e 1
5 liudehua 39 e 2
2 xuzheng 36 e 3
6 liuyifei 35 e 4
4 liushishi 22 e 5
1 huangbo 45 f 1
5 liudehua 39 f 2
// 最终SQL的具体实现:利用窗口分析函数去做
select c.favor, c.name, c.age from
(
select aa.id, aa.name, aa.age, aa.favor,
row_number() over (distribute by aa.favor sort by aa.age desc) as index
from
(
select a.id as id, a.name as name, a.age as age, favor_view.favor
from exercise5 a
LATERAL VIEW explode(split(a.favors, "-")) favor_view as favor
) aa
) c
where c.index <= 2;
每种爱好中,年龄最大的前2个人:
a huangbo 45
a liuyifei 35
b xuzheng 36
c huangbo 45
c huanglei 41
d huangbo 45
d huanglei 41
e huanglei 41
e liudehua 39
f huangbo 45
f liudehua 39
至此结束!!!!希望大家有所收获
以上是关于Hive--笔试题05_2--求TopN的主要内容,如果未能解决你的问题,请参考以下文章