Hive3.1.2自带的系统函数及UDF的随系统自动注册

Posted 虎鲸不是鱼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive3.1.2自带的系统函数及UDF的随系统自动注册相关的知识,希望对你有一定的参考价值。

Hive3.1.2自带的系统函数及UDF的随系统自动注册

前言

之前写过一篇稿子介绍了如何使用UDF函数:https://lizhiyong.blog.csdn.net/article/details/126186377

其中比较重要的一个类就是GenericUDF。通过继承该类并自行实现具体算法、打Jar包、加载Jar包到Hive、注册到Hive及在HQL中使用函数,大体上介绍了使用流程。用户自己写的函数是通过这么一些列骚操作实现的,那么Hive自带的函数是如何就可以不用注册,直接给租户们使用?

搞明白这一点,就可以将最常用的UDF自动注册到Hive,避免经常需要加载Jar包及注册函数的繁琐操作。尤其是自行注册的UDF函数貌似默认是只对当前库生效,跨库使用时还需要使用库名.UDF函数名来调用UDF函数,并不是非常方便。

寻找Hive自带函数

直接在idea中按2次shift即可搜索Java类。笔者以RPAD函数为例。

package org.apache.hadoop.hive.ql.udf.generic;

import org.apache.hadoop.hive.ql.exec.Description;

/**
 * UDFRpad.
 *
 */
@Description(name = "rpad", value = "_FUNC_(str, len, pad) - " +
    "Returns str, right-padded with pad to a length of len",
    extended = "If str is longer than len, the return value is shortened to "
    + "len characters.\\n"
    + "In case of empty pad string, the return value is null.\\n"
    + "Example:\\n"
    + "  > SELECT _FUNC_('hi', 5, '??') FROM src LIMIT 1;\\n"
    + "  'hi???'\\n"
    + "  > SELECT _FUNC_('hi', 1, '??') FROM src LIMIT 1;\\n"
    + "  'h'\\n"
    + "  > SELECT _FUNC_('hi', 5, '') FROM src LIMIT 1;\\n"
    + "  null")
public class GenericUDFRpad extends GenericUDFBasePad 
  public GenericUDFRpad() 
    super("rpad");
  

  @Override
  protected void performOp(
      StringBuilder builder, int len, String str, String pad) 
    int pos = str.length();
    // Copy the text
    builder.append(str, 0, pos);

    // Copy the padding
    while (pos < len) 
      builder.append(pad);
      pos += pad.length();
    
    builder.setLength(len);
  

可以找到这个类。它继承了GenericUDFBasePad类,从Java源码可以粗略看出这货是要在字符串右侧追加字符。

其父类:

package org.apache.hadoop.hive.ql.udf.generic;

public abstract class GenericUDFBasePad extends GenericUDF 
  private transient Converter converter1;
  private transient Converter converter2;
  private transient Converter converter3;
  private Text result = new Text();
  private String udfName;
  private StringBuilder builder;

  public GenericUDFBasePad(String _udfName) 
    this.udfName = _udfName;
    this.builder = new StringBuilder();
  

  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException 
    if (arguments.length != 3) 
      throw new UDFArgumentException(udfName + " requires three arguments. Found :"
        + arguments.length);
    
    converter1 = checkTextArguments(arguments, 0);
    converter2 = checkIntArguments(arguments, 1);
    converter3 = checkTextArguments(arguments, 2);
    return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
  

  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException 
    Object valObject1 = arguments[0].get();
    Object valObject2 = arguments[1].get();
    Object valObject3 = arguments[2].get();
    if (valObject1 == null || valObject2 == null || valObject3 == null) 
      return null;
    
    Text str = (Text) converter1.convert(valObject1);
    IntWritable lenW = (IntWritable) converter2.convert(valObject2);
    Text pad = (Text) converter3.convert(valObject3);
    if (str == null || pad == null || lenW == null || pad.toString().isEmpty()) 
      return null;
    
    int len = lenW.get();
    builder.setLength(0);

    performOp(builder, len, str.toString(), pad.toString());
    result.set(builder.toString());
    return result;
  

  @Override
  public String getDisplayString(String[] children) 
    return getStandardDisplayString(udfName, children);
  

  protected abstract void performOp(
      StringBuilder builder, int len, String str, String pad);

  // Convert input arguments to Text, if necessary.
  private Converter checkTextArguments(ObjectInspector[] arguments, int i)
    throws UDFArgumentException 
    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) 
      throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
      + arguments[i].getTypeName() + " is passed.");
    

    Converter converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
          PrimitiveObjectInspectorFactory.writableStringObjectInspector);

    return converter;
  

  private Converter checkIntArguments(ObjectInspector[] arguments, int i)
    throws UDFArgumentException 
    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) 
      throw new UDFArgumentTypeException(i, "Only primitive type arguments are accepted but "
      + arguments[i].getTypeName() + " is passed.");
    
    PrimitiveCategory inputType = ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
    Converter converter;
    switch (inputType) 
    case INT:
    case SHORT:
    case BYTE:
      converter = ObjectInspectorConverters.getConverter((PrimitiveObjectInspector) arguments[i],
      PrimitiveObjectInspectorFactory.writableIntObjectInspector);
      break;
    default:
      throw new UDFArgumentTypeException(i + 1, udfName
      + " only takes INT/SHORT/BYTE types as " + (i + 1) + "-ths argument, got "
      + inputType);
    
    return converter;
  

也是和普通的UDF一样,继承了GenericUDF类。该类此处不再赘述。

当然顺藤摸瓜,可以发现Hive自带的函数集中存放于org.apache.hadoop.hive.ql.udf.generic这个包下:

根据Java类的名称,就可以看出它们为哪种函数提供了算法:

例如这个Trim函数:

package org.apache.hadoop.hive.ql.udf.generic;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedExpressions;
import org.apache.hadoop.hive.ql.exec.vector.expressions.StringTrim;

/**
 * UDFTrim.
 *
 */
@Description(name = "trim",
    value = "_FUNC_(str) - Removes the leading and trailing space characters from str ",
    extended = "Example:\\n"
    + "  > SELECT _FUNC_('   facebook  ') FROM src LIMIT 1;\\n" + "  'facebook'")
@VectorizedExpressions( StringTrim.class )
public class GenericUDFTrim extends GenericUDFBaseTrim 
  public GenericUDFTrim() 
    super("trim");
  

  @Override
  protected String performOp(String val) 
    return StringUtils.strip(val, " ");
  


不必多言,就是那个去空格的trim函数。显然,Hive自带的函数和用户自定义的UDF并没有什么太大的差别,底层都是继承了相同的类。只不过开源社区事先把Hive常用的功能函数写好了。

至此找到了Hive的自带函数存放的包名。

寻找Hive自动注册函数的方法

同样是以RPAD函数为例。在idea使用alt+F7可以找到调用关系:

显然这个GenericUDFRpad类会被registerGenericUDF方法调用。根据函数名称,可以推测出注册函数的功能与之一定有千丝万缕的联系。

跳入该类:

package org.apache.hadoop.hive.ql.exec;

/**
 * FunctionRegistry.
 */
public final class FunctionRegistry 

  private static final Logger LOG = LoggerFactory.getLogger(FunctionRegistry.class);

  /*
   * PTF variables
   * */

  public static final String LEAD_FUNC_NAME = "lead";
  public static final String LAG_FUNC_NAME = "lag";
  public static final String LAST_VALUE_FUNC_NAME = "last_value";

  public static final String UNARY_PLUS_FUNC_NAME = "positive";
  public static final String UNARY_MINUS_FUNC_NAME = "negative";

  public static final String WINDOWING_TABLE_FUNCTION = "windowingtablefunction";
  private static final String NOOP_TABLE_FUNCTION = "noop";
  private static final String NOOP_MAP_TABLE_FUNCTION = "noopwithmap";
  private static final String NOOP_STREAMING_TABLE_FUNCTION = "noopstreaming";
  private static final String NOOP_STREAMING_MAP_TABLE_FUNCTION = "noopwithmapstreaming";
  private static final String MATCH_PATH_TABLE_FUNCTION = "matchpath";

  public static final Set<String> HIVE_OPERATORS = new HashSet<String>();

  static 
    HIVE_OPERATORS.addAll(Arrays.asList(
        "+", "-", "*", "/", "%", "div", "&", "|", "^", "~",
        "and", "or", "not", "!",
        "=", "==", "<=>", "!=", "<>", "<", "<=", ">", ">=",
        "index"));
  

  // registry for system functions
  private static final Registry system = new Registry(true);

  static 
    system.registerGenericUDF("concat", GenericUDFConcat.class);
    system.registerUDF("substr", UDFSubstr.class, false);
    system.registerUDF("substring", UDFSubstr.class, false);
    system.registerGenericUDF("substring_index", GenericUDFSubstringIndex.class);
    system.registerUDF("space", UDFSpace.class, false);
    system.registerUDF("repeat", UDFRepeat.class, false);
    system.registerUDF("ascii", UDFAscii.class, false);
    system.registerGenericUDF("lpad", GenericUDFLpad.class);
    system.registerGenericUDF("rpad", GenericUDFRpad.class);
    system.registerGenericUDF("levenshtein", GenericUDFLevenshtein.class);
    system.registerGenericUDF("soundex", GenericUDFSoundex.class);

    system.registerGenericUDF("size", GenericUDFSize.class);

    system.registerGenericUDF("round", GenericUDFRound.class);
    system.registerGenericUDF("bround", GenericUDFBRound.class);
    system.registerGenericUDF("floor", GenericUDFFloor.class);
    system.registerUDF("sqrt", UDFSqrt.class, false);
    system.registerGenericUDF("cbrt", GenericUDFCbrt.class);
    system.registerGenericUDF("ceil", GenericUDFCeil.class);
    system.registerGenericUDF("ceiling", GenericUDFCeil.class);
    system.registerUDF("rand", UDFRand.class, false);
    system.registerGenericUDF("abs", GenericUDFAbs.class);
    system.registerGenericUDF("sq_count_check", GenericUDFSQCountCheck.class);
    system.registerGenericUDF("enforce_constraint", GenericUDFEnforceConstraint.class);
    system.registerGenericUDF("pmod", GenericUDFPosMod.class);

    system.registerUDF("ln", UDFLn.class, false);
    system.registerUDF("log2", UDFLog2.class, false);
    system.registerUDF("sin", UDFSin.class, false);
    system.registerUDF("asin", UDFAsin.class, false);
    system.registerUDF("cos", UDFCos.class, false);
    system.registerUDF("acos", UDFAcos.class, false);
    system.registerUDF("log10", UDFLog10.class, false);
    system.registerUDF("log", UDFLog.class, false);
    system.registerUDF("exp", UDFExp.class, false);
    system.registerGenericUDF("power", GenericUDFPower.class);
    system.registerGenericUDF("pow", GenericUDFPower.class);
    system.registerUDF("sign", UDFSign.class, false);
    system.registerUDF("pi", UDFPI.class, false);
    system.registerUDF("degrees", UDFDegrees.class, false);
    system.registerUDF("radians", UDFRadians.class, false);
    system.registerUDF("atan", UDFAtan.class, false);
    system.registerUDF("tan", UDFTan.class, false);
    system.registerUDF("e", UDFE.class, false);
    system.registerGeneri

以上是关于Hive3.1.2自带的系统函数及UDF的随系统自动注册的主要内容,如果未能解决你的问题,请参考以下文章

使用Java继承UDF类或GenericUDF类给Hive3.1.2编写UDF实现编码解码加密解密并运行在USDP大数据集群

使用Java继承UDF类或GenericUDF类给Hive3.1.2编写UDF实现编码解码加密解密并运行在USDP大数据集群

Hive函数(系统内置函数,自定义函数)

Hive函数(系统内置函数,自定义函数)

Hive3.1.2使用CDH自带的Spark2.4报配置错误解决方案

05hive函数