hive进阶技巧

Posted 2021-12-24 allen-rg

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了hive进阶技巧相关的知识，希望对你有一定的参考价值。

1.日期格式转换（将yyyymmdd转换为yyyy-mm-dd）

select from_unixtime(unix_timestamp(‘20180905‘,‘yyyymmdd‘),‘yyyy-mm-dd‘)

2..hive去掉字段中除字母和数字外的其它字符

select regexp_replace(a, ‘[^0-9a-zA-Z]‘, ‘‘) from tbl_name

3.hive解析json字段
content字段存储json "score":"100","name":"zhou","class":‘‘math"，若要对json进行解析，则可用以下方式

---解析单个字段
select get_json_object(content,‘$.score‘) ,
           get_json_object(content,‘$.name),
           get_json_object(content,‘$.class‘)
 from tbl_name
---解析多个字段可以用json_tuple
select a.*
      ,b.score
      ,b.name
      ,b.class
 from tbl a 
lateral view outer json_tuple(a.content,‘score‘, ‘name‘, ‘class‘) b as score,name,class

4.hive 导入数据
若从本地文件系统上传，需要加上local关键字;如果直接从hdfs路径上传，则不加local

load data [local] inpath ‘/data/monthcard.csv‘ overwrite into table tbl_name;

5.hive 避免科学计数法

select printf("%.2f",3.428777027500007E7)

6.hive collect_set和lateral view explode用法
原始数据

id1    id2    name
1       1       A
1       1       B
1       1       C
1       2       X
1       2       Y

(1)collect_set

select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2,
count(name) as cnt
from default.zql_test
group by id1,id2;
---输出结果
OK
id1     id2     new_name1       new_name2       cnt
1       1       ["C","A","B"]   []      3
1       2       ["X","Y"]       ["X","Y"]       2

(2)lateral view explode

select * 
from 
(
select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2,
count(name) as cnt
from default. zql_test
group by id1,id2
)t
lateral view explode(new_name1) t as new_type1 
lateral view explode(new_name2) t as new_type2
----输出结果
OK
t.id1   t.id2   t.new_name1     t.new_name2     t.cnt   t.new_type1     t.new_type2
1       2       ["Y","X"]       ["Y","X"]       2       Y       Y
1       2       ["Y","X"]       ["Y","X"]       2       Y       X
1       2       ["Y","X"]       ["Y","X"]       2       X       Y
1       2       ["Y","X"]       ["Y","X"]       2       X       X

(3)lateral view explode outer ，加上outer会保留所有记录，两者差异可以参考之前的专题

select * 
from 
(
select id1,id2,
collect_set(name) as new_name1,
collect_set(case when id2>1 then name end) as new_name2,
count(name) as cnt
from default. zql_test
group by id1,id2
)t
lateral view outer explode(new_name1) t as new_type1 
lateral view outer explode(new_name2) t as new_type2
;

----输出结果
OK
t.id1   t.id2   t.new_name1     t.new_name2     t.cnt   t.new_type1     t.new_type2
1       1       ["B","A","C"]   []      3       B       NULL
1       1       ["B","A","C"]   []      3       A       NULL
1       1       ["B","A","C"]   []      3       C       NULL
1       2       ["X","Y"]       ["X","Y"]       2       X       X
1       2       ["X","Y"]       ["X","Y"]       2       X       Y
1       2       ["X","Y"]       ["X","Y"]       2       Y       X
1       2       ["X","Y"]       ["X","Y"]       2       Y       Y

7.hive取前百分之几

---分组内将数据分成两片
ntile(2)over(partition by id order by create_tm)

8.hive返回星期几的方法

---2012-01-01刚好星期日
select pmod(datediff(from_unixtime(unix_timestamp()),‘2012-01-01‘),7) from default.dual;
 
--返回值0-6
--其中0代表星期日

9.hive产生uuid

select regexp_replace(reflect("java.util.UUID", "randomUUID"), "-", "");

10.hive中匹配中文

select  regexp ‘[\\u4e00-\\u9fa5]‘;

11.hive中regexp_extract的用法
regexp_extract(string subject, string regex_pattern, string index)
说明：抽取字符串subject中符合正则表达式regex_pattern的第index个部分的字符串

第一参数：要处理的字段
第二参数: 需要匹配的正则表达式
第三个参数:
0是显示与之匹配的整个字符串
1 是显示第一个括号里面的
2 是显示第二个括号里面的字段...

举例：
--取一个连续17位为数字的字符串，且两端为非数字

select regexp_extract(‘1、非订单号(20位):00123456789876543210；
                      2、订单号(17位):12345678987654321；
                      3、其它文字‘,‘[^\\d](\\d17)[^\\d]‘,0) as s1
, substr(regexp_extract(‘1、非订单号(20位):01234567898765432100；
                      2、订单号(17位):12345678987654321；
                      3、其它文字‘,‘[^\\d](\\d17)[^\\d]‘,0),2,17) as s2
,regexp_extract(‘1、非订单号(20位):00123456789876543210；
                      2、订单号(17位):12345678987654321；
                      3、其它文字‘,‘[^\\d](\\d17)[^\\d]‘,1) as s3;

链接：https://www.jianshu.com/p/fe1cdd06f5f8

以上是关于hive进阶技巧的主要内容，如果未能解决你的问题，请参考以下文章

hive进阶 技巧

hive进阶技巧