2018-10-29#regexp_extract+get_json_object
Posted 大数据-大道至简
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了2018-10-29#regexp_extract+get_json_object相关的知识,希望对你有一定的参考价值。
Hive/LanguageManual+UDF
正则表达式解析函数:regexp_extract
语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明:将字符串subject按照pattern正则表达式的规则拆分,返回index指定的字符。注意,在有些情况下要使用转义字符
举例:
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) from dual;
the
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) from dual;
bar
hive> select regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 0) from dual;
foothebar
-- 最近遇到的一个例子
select regexp_extract('{"search_content": "bikerjacket","result_content": "2`bikerjacket`1","abtest": ""}', 'result_content": "(.*?)",', 0);
json串解析:get_json_object
select get_json_object('{"search_content": "bikerjacket","result_content": "2`bikerjacket`1","abtest": ""}', '$.result_content');
json array 串解析
待解析数据
[{"orderPromotionId":"order_149","orderPromotionTag":"日亚美妆专题-2件8折","orderPromotionType":"10","orderPromotionValue":"110.60"}]
json array string 解析
select get_json_object(regexp_extract('[{"orderPromotionId":"order_149","orderPromotionTag":"日亚美妆专题-2件8折","orderPromotionType":"10","orderPromotionValue":"110.60"}]','^\[(.+)\]$',1), '$.orderPromotionId');
行转列
参考示例:
SELECT get_json_object(single_json_table.single_json, '$.ts') AS ts,
get_json_object(single_json_table.single_json, '$.id') AS id,
get_json_object(single_json_table.single_json, '$.log') AS log
FROM (
SELECT explode(json_array_col) as single_json FROM jt
) single_json_table ;
SELECT get_json_object(single_json_table.single_json, '$.orderPromotionId') AS ts,
get_json_object(single_json_table.single_json, '$.orderPromotionTag') AS id,
get_json_object(single_json_table.single_json, '$.orderPromotionType') AS log
FROM (
SELECT explode('[{"orderPromotionId":"order_149","orderPromotionTag":"日亚美妆专题-2件8折","orderPromotionType":"10","orderPromotionValue":"110.60"}]') as single_json
-- FROM jt
) single_json_table ;
参考资料:csdn-hive中解析json数组
以上是关于2018-10-29#regexp_extract+get_json_object的主要内容,如果未能解决你的问题,请参考以下文章
如何仅使用 Hive 中的 regexp_extract 函数提取标点符号
Hive str to array / regexp_extract