Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]

Posted

技术标签:

【中文标题】Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]【英文标题】:Pig UDF is throwing an error Caught error from UDF: GetCounty, Out of bounds access [1] 【发布时间】:2014-12-29 04:47:14 【问题描述】:

我正在编写一个猪程序,它读取包含城市、zip 的文件,然后将城市传递给 UDF。 UDF 将在哈希图中加载包含县、市的文件。 UDF 然后从哈希映射中找到城市的县并返回它。

请让我知道我在这里做错了什么;运行程序时出现以下错误:

2014-12-28 16:15:16,506 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: GetCounty, Out of bounds access [1]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:370)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at GetCounty.exec(GetCounty.java:33)
at GetCounty.exec(GetCounty.java:1)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
... 15 more
2014-12-28 16:15:16,510 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

输入文件包含以下数据:

File zipcity:
irving  75038
san francisco   94903
san rafael      94905
las vegas       98043
coppel  75063

File citycounty:
irving  dallas
las vegas       tarrant
san francisco   san francisco
coppel  dallas

public class GetCounty extends EvalFunc<String> 
String lookupfile;
HashMap<String, String> lookup = null;

public String exec(Tuple input) throws IOException
    if ( input.size() != 1 )       
        return null;
    

    if ( lookup == null ) 
        FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
        DataInputStream in = fs.open(new Path(lookupfile));
        String line;
        while ( (line = in.readLine()) != null)
            String[] tok = new String[2];
            tok = line.split(":", 2);
            lookup.put(tok[0], tok[1]);
        

    String city = (String) input.get(0);        
    return lookup.get(city);        


public GetCounty(String f)
    lookupfile = f;


我调用 pig 如下:

grunt> register 'PigMyUDF.jar';
grunt> define GetCounty GetCounty('pig/citycounty');
grunt> a = load 'pig/zipcity' as ( city:chararray, zip:int );
grunt> b = foreach a generate city, zip, GetCounty(city);
grunt> dump b;

【问题讨论】:

这个问题可以用pig轻松解决,只是好奇你为什么要使用UDF?在 UDF 代码中,为什么要根据“:”作为分隔符拆分字符串,我在输入中没有看到“:”。 谢谢西瓦萨赫蒂!!这是由于代码复制粘贴习惯..但我几个小时都找不到它.. 我使用原生 pig 发布了解决方案,而不是 UDF,请让我知道这是否适合您。 感谢 Sivasakhthi !我能够用你在 UDF 中的观点解决问题,我改变了 delim 并且它起作用了。由于 HashMap lookup = null; 我的代码失败的另一件事被声明导致空指针异常。 Pig book 中的编程也有这个错误。我能够使用您提供的解决方案解决此问题。 【参考方案1】:

你可以试试这个吗?输入字段由制表符分隔。

邮编

irving  75038
san francisco   94903
san rafael      94905
las vegas       98043
coppel  75063

市县

irving  dallas
las vegas       tarrant
san francisco   san francisco
coppel  dallas

PigScript:

A = LOAD 'zipcity' AS (city:chararray, zip:int);
B = LOAD 'citycounty' AS  (city:chararray,country:chararray);
C = JOIN A BY city,B BY city;
D = FOREACH C GENERATE A::city AS city,A::zip AS zip,B::country AS country;
DUMP D;

输出:

(coppel,75063,dallas)
(irving,75038,dallas)
(las vegas,98043,tarrant)
(san francisco,94903,san francisco)

【讨论】:

以上是关于Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]的主要内容,如果未能解决你的问题,请参考以下文章

Java中的Pig UDF:错误---错误1066:无法打开别名的迭代器

在 Pig Latin 中加载 UDF 时发生 ClassCastException 错误

Pig 脚本无法注册 UDF

在 Jython 的 Pig UDF 中导入外部库时出现错误 1121

鉴于我将 DataBag 溢出到磁盘,为啥此 Pig UDF 会导致“错误:Java 堆空间”?

Pig UDF 中的 Java 依赖项