Hive 通用 UDTF 因数组索引超出范围错误而失败
Posted
技术标签:
【中文标题】Hive 通用 UDTF 因数组索引超出范围错误而失败【英文标题】:Hive generic UDTF fails with array index out of bound error 【发布时间】:2020-04-17 21:01:53 【问题描述】:这是关于 Hive 通用 UDTF。
该程序的目的是将一个字符串列作为输入,并将输入列(字符串)按空格分割后输出应为多行。生成了jar文件并在hive shell中添加了jar,还为类名创建了临时函数。调用函数 gtting 数组索引超出范围时出错。
代码:
package com.suba.customHiveUdfs;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
public class MyUdtf extends GenericUDTF
ArrayList<String> colList = new ArrayList<>(1);
ArrayList<ObjectInspector> oiList = new ArrayList<>(1);
PrimitiveObjectInspector poi = null;
@Override
public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException
// TODO Auto-generated method stub
if (argOIs.length > 1)
throw new UDFArgumentException("invalid argument");
if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE)
throw new UDFArgumentException("primitive expected");
if (((PrimitiveObjectInspector) argOIs[0])
.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING)
throw new UDFArgumentException("not string type");
poi = (PrimitiveObjectInspector) argOIs[0];
colList.add("name");
oiList.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectInspector(colList, oiList);
@Override
public void process(Object[] arg0) throws HiveException
String name = ((PrimitiveObjectInspector) poi).getPrimitiveJavaObject(arg0[0]).toString();
String[] tokens = name.split(" ");
for (String x : tokens)
Object[] objects = new Object[] x ;
forward(objects);
@Override
public void close() throws HiveException
如下所示的错误信息:Getting array index out of bound error。
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at java.util.Arrays$ArrayList.get(Arrays.java:3841)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:417)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:125)
at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:107)
at com.suba.customHiveUdfs.MyUdtf.process(MyUdtf.java:61)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:108)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:796)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:539)
... 9 more
【问题讨论】:
将代码更改为 -> for (String x : tokens) String string[] = new String[] x ;转发(字符串); 【参考方案1】:一旦更改或在流程方法中循环,问题就解决了。
for (String x : tokens)
String string[] = new String[] x ;
forward(string);
【讨论】:
以上是关于Hive 通用 UDTF 因数组索引超出范围错误而失败的主要内容,如果未能解决你的问题,请参考以下文章