Hive3.1.2自带的系统函数及UDF的随系统自动注册
Posted 虎鲸不是鱼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive3.1.2自带的系统函数及UDF的随系统自动注册相关的知识,希望对你有一定的参考价值。
Hive3.1.2自带的系统函数及UDF的随系统自动注册
前言
之前写过一篇稿子介绍了如何使用UDF函数:https://lizhiyong.blog.csdn.net/article/details/126186377
其中比较重要的一个类就是GenericUDF
。通过继承该类并自行实现具体算法、打Jar包、加载Jar包到Hive、注册到Hive及在HQL中使用函数,大体上介绍了使用流程。用户自己写的函数是通过这么一些列骚操作实现的,那么Hive自带的函数是如何就可以不用注册,直接给租户们使用?
搞明白这一点,就可以将最常用的UDF自动注册到Hive,避免经常需要加载Jar包及注册函数的繁琐操作。尤其是自行注册的UDF函数貌似默认是只对当前库生效,跨库使用时还需要使用库名.UDF函数名
来调用UDF函数,并不是非常方便。
寻找Hive自带函数
直接在idea中按2次shift即可搜索Java类。笔者以RPAD函数为例。
package org.apache.hadoop.hive.ql.udf.generic;
import org.apache.hadoop.hive.ql.exec.Description;
/**
* UDFRpad.
*
*/
@Description(name = "rpad", value = "_FUNC_(str, len, pad) - " +
"Returns str, right-padded with pad to a length of len",
extended = "If str is longer than len, the return value is shortened to "
+ "len characters.\\n"
+ "In case of empty pad string, the return value is null.\\n"
+ "Example:\\n"
+ " > SELECT _FUNC_('hi', 5, '??') FROM src LIMIT 1;\\n"
+ " 'hi???'\\n"
+ " > SELECT _FUNC_('hi', 1, '??') FROM src LIMIT 1;\\n"
+ " 'h'\\n"
+ " > SELECT _FUNC_('hi', 5, '') FROM src LIMIT 1;\\n"
+ " null")
public class GenericUDFRpad extends GenericUDFBasePad
public GenericUDFRpad()
super("rpad");
@Override
protected void performOp(
StringBuilder builder, int len, String str, String pad)
int pos = str.length();
// Copy the text
builder.append(str, 0, pos);
// Copy the padding
while (pos < len)
builder.append(pad);
pos += pad.length();
builder.setLength(len);
可以找到这个类。它继承了GenericUDFBasePad类,从Java源码可以粗略看出这货是要在字符串右侧追加字符。
其父类:
package org.apache.hadoop.hive.ql.udf.generic;
public abstract class GenericUDFBasePad extends GenericUDF
private transient Converter converter1;
private transient Converter converter2;
private transient Converter converter3;
private Text result = new Text();
private String udfName;
private StringBuilder builder;
public GenericUDFBasePad(String _udfName)
this.udfName = _udfName;
this.builder = new StringBuilder();
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
if (arguments.length != 3)
throw new UDFArgumentException(udfName + " requires three arguments. Found :"
+ arguments.length);
converter1 = checkTextArguments(arguments, 0);
converter2 = checkIntArguments(arguments, 1);
converter3 = checkTextArguments(arguments, 2);
return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException
Object valObject1 = arguments[0].get();
Object valObject2 = arguments[1].get();
Object valObject3 = arguments[2].get();
if (valObject1 == null || valObject2 == null || valObject3 == null)
return null;
Text str = (Text) converter1.convert(valObject1);
IntWritable lenW = (IntWritable) converter2.convert(valObject2);
Text pad = (Text) converter3.convert(valObject3);
if (str == null || pad == null || lenW == null || pad.toString().isEmpty())
return null;
int len = lenW.get();
builder.setLength(0);
performOp(builder, len, str.toString(), pad.toString());
result.set(builder.toString());
return result;
@Override
public String getDisplayString(String[] children)
return getStandardDisplayString(udfName, children);
protected abstract void performOp(
StringBuilder builder, int len, String str, String pad);
// Convert input arguments to Text, if necessary.
private Converter checkTextArguments(ObjectInspector[] arguments, int i)
throws UDFArgumentException
if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE)
throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
+ arguments[i].getTypeName() + " is passed.");
Converter converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
PrimitiveObjectInspectorFactory.writableStringObjectInspector);
return converter;
private Converter checkIntArguments(ObjectInspector[] arguments, int i)
throws UDFArgumentException
if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE)
throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
+ arguments[i].getTypeName() + " is passed.");
PrimitiveCategory inputType = ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
Converter converter;
switch (inputType)
case INT:
case SHORT:
case BYTE:
converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
PrimitiveObjectInspectorFactory.writableIntObjectInspector);
break;
default:
throw new UDFArgumentTypeException(i + 1, udfName
+ " only takes INT/SHORT/BYTE types as " + (i + 1) + "-ths argument, got "
+ inputType);
return converter;
也是和普通的UDF一样,继承了GenericUDF类。该类此处不再赘述。
当然顺藤摸瓜,可以发现Hive自带的函数集中存放于org.apache.hadoop.hive.ql.udf.generic
这个包下:
根据Java类的名称,就可以看出它们为哪种函数提供了算法:
例如这个Trim函数:
package org.apache.hadoop.hive.ql.udf.generic;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedExpressions;
import org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim;
/**
* UDFTrim.
*
*/
@Description(name = "trim",
value = "_FUNC_(str) - Removes the leading and trailing space characters from str ",
extended = "Example:\\n"
+ " > SELECT _FUNC_(' facebook ') FROM src LIMIT 1;\\n" + " 'facebook'")
@VectorizedExpressions( StringTrim.class )
public class GenericUDFTrim extends GenericUDFBaseTrim
public GenericUDFTrim()
super("trim");
@Override
protected String performOp(String val)
return StringUtils.strip(val, " ");
不必多言,就是那个去空格的trim函数。显然,Hive自带的函数和用户自定义的UDF并没有什么太大的差别,底层都是继承了相同的类。只不过开源社区事先把Hive常用的功能函数写好了。
至此找到了Hive的自带函数存放的包名。
寻找Hive自动注册函数的方法
同样是以RPAD函数为例。在idea使用alt+F7可以找到调用关系:
显然这个GenericUDFRpad类会被registerGenericUDF方法调用。根据函数名称,可以推测出注册函数的功能与之一定有千丝万缕的联系。
跳入该类:
package org.apache.hadoop.hive.ql.exec;
/**
* FunctionRegistry.
*/
public final class FunctionRegistry
private static final Logger LOG = LoggerFactory.getLogger(FunctionRegistry.class);
/*
* PTF variables
* */
public static final String LEAD_FUNC_NAME = "lead";
public static final String LAG_FUNC_NAME = "lag";
public static final String LAST_VALUE_FUNC_NAME = "last_value";
public static final String UNARY_PLUS_FUNC_NAME = "positive";
public static final String UNARY_MINUS_FUNC_NAME = "negative";
public static final String WINDOWING_TABLE_FUNCTION = "windowingtablefunction";
private static final String NOOP_TABLE_FUNCTION = "noop";
private static final String NOOP_MAP_TABLE_FUNCTION = "noopwithmap";
private static final String NOOP_STREAMING_TABLE_FUNCTION = "noopstreaming";
private static final String NOOP_STREAMING_MAP_TABLE_FUNCTION = "noopwithmapstreaming";
private static final String MATCH_PATH_TABLE_FUNCTION = "matchpath";
public static final Set<String> HIVE_OPERATORS = new HashSet<String>();
static
HIVE_OPERATORS.addAll(Arrays.asList(
"+", "-", "*", "/", "%", "div", "&", "|", "^", "~",
"and", "or", "not", "!",
"=", "==", "<=>", "!=", "<>", "<", "<=", ">", ">=",
"index"));
// registry for system functions
private static final Registry system = new Registry(true);
static
system.registerGenericUDF("concat", GenericUDFConcat.class);
system.registerUDF("substr", UDFSubstr.class, false);
system.registerUDF("substring", UDFSubstr.class, false);
system.registerGenericUDF("substring_index", GenericUDFSubstringIndex.class);
system.registerUDF("space", UDFSpace.class, false);
system.registerUDF("repeat", UDFRepeat.class, false);
system.registerUDF("ascii", UDFAscii.class, false);
system.registerGenericUDF("lpad", GenericUDFLpad.class);
system.registerGenericUDF("rpad", GenericUDFRpad.class);
system.registerGenericUDF("levenshtein", GenericUDFLevenshtein.class);
system.registerGenericUDF("soundex", GenericUDFSoundex.class);
system.registerGenericUDF("size", GenericUDFSize.class);
system.registerGenericUDF("round", GenericUDFRound.class);
system.registerGenericUDF("bround", GenericUDFBRound.class);
system.registerGenericUDF("floor", GenericUDFFloor.class);
system.registerUDF("sqrt", UDFSqrt.class, false);
system.registerGenericUDF("cbrt", GenericUDFCbrt.class);
system.registerGenericUDF("ceil", GenericUDFCeil.class);
system.registerGenericUDF("ceiling", GenericUDFCeil.class);
system.registerUDF("rand", UDFRand.class, false);
system.registerGenericUDF("abs", GenericUDFAbs.class);
system.registerGenericUDF("sq_count_check", GenericUDFSQCountCheck.class);
system.registerGenericUDF("enforce_constraint", GenericUDFEnforceConstraint.class);
system.registerGenericUDF("pmod", GenericUDFPosMod.class);
system.registerUDF("ln", UDFLn.class, false);
system.registerUDF("log2", UDFLog2.class, false);
system.registerUDF("sin", UDFSin.class, false);
system.registerUDF("asin", UDFAsin.class, false);
system.registerUDF("cos", UDFCos.class, false);
system.registerUDF("acos", UDFAcos.class, false);
system.registerUDF("log10", UDFLog10.class, false);
system.registerUDF("log", UDFLog.class, false);
system.registerUDF("exp", UDFExp.class, false);
system.registerGenericUDF("power", GenericUDFPower.class);
system.registerGenericUDF("pow", GenericUDFPower.class);
system.registerUDF("sign", UDFSign.class, false);
system.registerUDF("pi", UDFPI.class, false);
system.registerUDF("degrees", UDFDegrees.class, false);
system.registerUDF("radians", UDFRadians.class, false);
system.registerUDF("atan", UDFAtan.class, false);
system.registerUDF("tan", UDFTan.class, false);
system.registerUDF("e", UDFE.class, false);
system.registerGeneri以上是关于Hive3.1.2自带的系统函数及UDF的随系统自动注册的主要内容,如果未能解决你的问题,请参考以下文章
使用Java继承UDF类或GenericUDF类给Hive3.1.2编写UDF实现编码解码加密解密并运行在USDP大数据集群
使用Java继承UDF类或GenericUDF类给Hive3.1.2编写UDF实现编码解码加密解密并运行在USDP大数据集群