Hive扫盲之Lateral View之列为空时候该行元素消失

Posted javartisan

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive扫盲之Lateral View之列为空时候该行元素消失相关的知识,希望对你有一定的参考价值。

采坑:当lateral view explode(arr) 时候,如果arr为一个空集合则会导致行元素丢失,因此需要进行特出处理

其实也不算坑吧,怪自己不了解而已。有时间完善下面文档阅读笔记。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

 

Lateral ViewJ结合UDTF函数使用,USER DEFINE TABLE FUNCTION.

UDTF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inTable-GeneratingFunctions(UDTF)

 

Built-in Table-Generating Functions (UDTF)

Normal user-defined functions, such as concat(), take in a single input row and output a single output row. In contrast, table-generating functions transform a single input row to multiple output rows.

通常UDF函数,例如concat函数都是输入一行输出一行;相反,UDTF函数会将一行转为多行。具体函数如下:

Row-set columns types

Name(Signature)

Description

T

explode(ARRAY<T> a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

Tkey,Tvalue

explode(MAP<Tkey,Tvalue> m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,Tposexplode(ARRAY<T> a)Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

T1,...,Tn

inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

T1,...,Tn/rstack(int r,T1 V1,...,Tn/r Vn)Breaks up n values V1,...,Vn into rows. Each row will have n/r columns. must be constant.
   

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY>.

 

以上是关于Hive扫盲之Lateral View之列为空时候该行元素消失的主要内容,如果未能解决你的问题,请参考以下文章

Hive之explode()函数和posexplode()函数和lateral view函数

Hive Lateral View + explode 详解

Hive Lateral View + explode 详解

[Hive]Lateral View使用指南

Hive学习之Lateral View

Hive:使用没有 LATERAL VIEW 的数组连接表爆炸