大数据之hive：行列转换系列总结

Posted 2021-09-08 浊酒南街

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了大数据之hive：行列转换系列总结相关的知识，希望对你有一定的参考价值。

1、行转列（一）

主要使用：

CONCAT(string A/col, string B/col…)：返回输入字符串连接后的结果，支持任意个输入字符串;
CONCAT_WS(separator, str1, str2,…)：它是一个特殊形式的 CONCAT()。第一个参数是剩余其他参数间的分隔符。
COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。

示例

原始数据：

name	constellation	blood_type
宋江	白羊座	A
鲁智深	射手	A
武松	白羊座	B
潘金莲	白羊座	A
西门庆	射手	A

期望输出结果：

constell_blood	name_list
射手座,A	鲁智深&西门庆
白羊座,A	宋江&潘金莲
白羊座,B	武松

实现：

select
    t1.base,
    concat_ws('&', collect_set(t1.name)) name
from
    (select
        name,
        concat(constellation, ",", blood_type) constell_blood
    from
        person_info) t1
group by
    t1.constell_blood;

2、列转行（一）

主要使用
EXPLODE(col)：将hive一列中复杂的array或者map结构拆分成多行。
LATERAL VIEW
用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias
解释：用于和split, explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。

示例

原始数据：

movie	category
《疑犯追踪》	悬疑&动作
《Lie to me》	悬疑&警匪

期望输出结果：

movie	category_name
《疑犯追踪》	悬疑
《疑犯追踪》	动作
《Lie to me》	悬疑
《Lie to me》	警匪

实现：

select
    movie,
    category_name
from 
 movie_info 
lateral view explode(split(category, "\\\\&")) table_tmp as category_name;

3、行转列（二）

主要使用sum(case when )

示例

原始数据：

stu_id	name	course	score
01	zhangsan	math	90
01	zhangsan	chinese	88
01	zhangsan	english	88
02	lisi	math	66
02	lisi	chinese	77
02	lisi	english	80

期望输出结果：

stu_id	name	mat_score	chi_score	eng_score
01	zhangsan	90	88	88
02	lisi	66	77	80

实现1：

select 
stu_id,
name,
sum(case when course='math' then score else 0 end) mat_score,
sum(case when course='chinese' then score else 0 end) chi_score,
sum(case when course='english' then score else 0 end) eng_score
from stu001 
group by stu_id,name

实现2：

select 
a.stu_id,
a.name,
(select score from stu001 m where course='math'  and a.stu_id=m.stu_id and m.name=a.name ) mat_score,
(select score from stu001 m where course='chinese' and a.stu_id=m.stu_id and m.name=a.name ) chi_score,
(select score from stu001 m where course='english'  and a.stu_id=m.stu_id and m.name=a.name ) eng_score
from stu001 a
group by stu_id,name;

4、列转行（二）

示例

主要使用union all
原始数据：

stu_id	name	mat_score	chi_score	eng_score
01	zhangsan	90	88	88
02	lisi	66	77	80

期望输出结果：

stu_id	name	course	score
01	zhangsan	math	90
01	zhangsan	chinese	88
01	zhangsan	english	88
02	lisi	math	66
02	lisi	chinese	77
02	lisi	english	80

实现：

select
stu_id,
name,
if(mat_score is not null ,'match',null) as course,
mat_score as score
from 
(select 
a.stu_id,
a.name,
(select score from stu001 m where course='math'  and a.stu_id=m.stu_id and m.name=a.name ) mat_score,
(select score from stu001 m where course='chinese' and a.stu_id=m.stu_id and m.name=a.name ) chi_score,
(select score from stu001 m where course='english'  and a.stu_id=m.stu_id and m.name=a.name ) eng_score
from stu001 a
group by stu_id,name) tmp0
union all
select
stu_id,
name,
if(chi_score is not null ,'chinese',null) as course,
chi_score as score
from 
(select 
a.stu_id,
a.name,
(select score from stu001 m where course='math'  and a.stu_id=m.stu_id and m.name=a.name ) mat_score,
(select score from stu001 m where course='chinese' and a.stu_id=m.stu_id and m.name=a.name ) chi_score,
(select score from stu001 m where course='english'  and a.stu_id=m.stu_id and m.name=a.name ) eng_score
from stu001 a
group by stu_id,name) tmp1
union all
select
stu_id,
name,
if(eng_score is not null ,'english',null) as course,
eng_score as score
from 
(select 
a.stu_id,
a.name,
(select score from stu001 m where course='math'  and a.stu_id=m.stu_id and m.name=a.name ) mat_score,
(select score from stu001 m where course='chinese' and a.stu_id=m.stu_id and m.name=a.name ) chi_score,
(select score from stu001 m where course='english'  and a.stu_id=m.stu_id and m.name=a.name ) eng_score
from stu001 a
group by stu_id,name) tmp2
order by stu_id,name;

以上是关于大数据之hive：行列转换系列总结的主要内容，如果未能解决你的问题，请参考以下文章

大数据之hive：行列转换系列总结

目录

1、行转列（一）

主要使用：

示例

2、列转行（一）

示例

3、行转列（二）

示例

4、列转行（二）

示例