Hive 通用 UDTF 因数组索引超出范围错误而失败

Posted

技术标签:

【中文标题】Hive 通用 UDTF 因数组索引超出范围错误而失败【英文标题】:Hive generic UDTF fails with array index out of bound error 【发布时间】:2020-04-17 21:01:53 【问题描述】:

这是关于 Hive 通用 UDTF。

该程序的目的是将一个字符串列作为输入,并将输入列(字符串)按空格分割后输出应为多行。生成了jar文件并在hive shell中添加了jar,还为类名创建了临时函数。调用函数 gtting 数组索引超出范围时出错。

代码:

package com.suba.customHiveUdfs;

import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
public class MyUdtf extends GenericUDTF 
    ArrayList<String> colList = new ArrayList<>(1);
    ArrayList<ObjectInspector> oiList = new ArrayList<>(1);
    PrimitiveObjectInspector poi = null;
    @Override
    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException 
        // TODO Auto-generated method stub
        if (argOIs.length > 1) 
            throw new UDFArgumentException("invalid argument");
        
        if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) 
            throw new UDFArgumentException("primitive expected");
        
        if (((PrimitiveObjectInspector) argOIs[0])
                .getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) 
            throw new UDFArgumentException("not string type");
        
        poi = (PrimitiveObjectInspector) argOIs[0];
        colList.add("name");
        oiList.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
        return ObjectInspectorFactory.getStandardStructObjectInspector(colList, oiList);
    
    @Override
    public void process(Object[] arg0) throws HiveException 
        String name = ((PrimitiveObjectInspector) poi).getPrimitiveJavaObject(arg0[0]).toString();
        String[] tokens = name.split(" ");
        for (String x : tokens) 
            Object[] objects = new Object[]  x ;
            forward(objects);
        
    
    @Override
    public void close() throws HiveException 
    

如下所示的错误信息:Getting array index out of bound error。

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at java.util.Arrays$ArrayList.get(Arrays.java:3841)
    at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:417)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
    at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:125)
    at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:107)
    at com.suba.customHiveUdfs.MyUdtf.process(MyUdtf.java:61)
    at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:108)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
    ... 9 more

【问题讨论】:

将代码更改为 -> for (String x : tokens) String string[] = new String[] x ;转发(字符串); 【参考方案1】:

一旦更改或在流程方法中循环,问题就解决了。

    for (String x : tokens) 
        String string[] = new String[]  x ;

        forward(string);
    

【讨论】:

以上是关于Hive 通用 UDTF 因数组索引超出范围错误而失败的主要内容,如果未能解决你的问题,请参考以下文章

Hive 字符串索引超出范围错误

Hive PARTITIONED BY,列表索引超出范围错误?

索引超出了数组的范围[重复]

Swift 致命错误:数组索引超出范围

Swift 致命错误:数组索引超出范围

索引超出范围错误(从文件中读取的数组)