Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]
Posted
技术标签:
【中文标题】Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]【英文标题】:Pig UDF is throwing an error Caught error from UDF: GetCounty, Out of bounds access [1] 【发布时间】:2014-12-29 04:47:14 【问题描述】:我正在编写一个猪程序,它读取包含城市、zip 的文件,然后将城市传递给 UDF。 UDF 将在哈希图中加载包含县、市的文件。 UDF 然后从哈希映射中找到城市的县并返回它。
请让我知道我在这里做错了什么;运行程序时出现以下错误:
2014-12-28 16:15:16,506 WARN org.apache.hadoop.mapred.Child: Error running child
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: GetCounty, Out of bounds access [1]
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:370)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:434)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at GetCounty.exec(GetCounty.java:33)
at GetCounty.exec(GetCounty.java:1)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
... 15 more
2014-12-28 16:15:16,510 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
输入文件包含以下数据:
File zipcity:
irving 75038
san francisco 94903
san rafael 94905
las vegas 98043
coppel 75063
File citycounty:
irving dallas
las vegas tarrant
san francisco san francisco
coppel dallas
public class GetCounty extends EvalFunc<String>
String lookupfile;
HashMap<String, String> lookup = null;
public String exec(Tuple input) throws IOException
if ( input.size() != 1 )
return null;
if ( lookup == null )
FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());
DataInputStream in = fs.open(new Path(lookupfile));
String line;
while ( (line = in.readLine()) != null)
String[] tok = new String[2];
tok = line.split(":", 2);
lookup.put(tok[0], tok[1]);
String city = (String) input.get(0);
return lookup.get(city);
public GetCounty(String f)
lookupfile = f;
我调用 pig 如下:
grunt> register 'PigMyUDF.jar';
grunt> define GetCounty GetCounty('pig/citycounty');
grunt> a = load 'pig/zipcity' as ( city:chararray, zip:int );
grunt> b = foreach a generate city, zip, GetCounty(city);
grunt> dump b;
【问题讨论】:
这个问题可以用pig轻松解决,只是好奇你为什么要使用UDF?在 UDF 代码中,为什么要根据“:”作为分隔符拆分字符串,我在输入中没有看到“:”。 谢谢西瓦萨赫蒂!!这是由于代码复制粘贴习惯..但我几个小时都找不到它.. 我使用原生 pig 发布了解决方案,而不是 UDF,请让我知道这是否适合您。 感谢 Sivasakhthi !我能够用你在 UDF 中的观点解决问题,我改变了 delim 并且它起作用了。由于 HashMap你可以试试这个吗?输入字段由制表符分隔。
邮编
irving 75038
san francisco 94903
san rafael 94905
las vegas 98043
coppel 75063
市县
irving dallas
las vegas tarrant
san francisco san francisco
coppel dallas
PigScript:
A = LOAD 'zipcity' AS (city:chararray, zip:int);
B = LOAD 'citycounty' AS (city:chararray,country:chararray);
C = JOIN A BY city,B BY city;
D = FOREACH C GENERATE A::city AS city,A::zip AS zip,B::country AS country;
DUMP D;
输出:
(coppel,75063,dallas)
(irving,75038,dallas)
(las vegas,98043,tarrant)
(san francisco,94903,san francisco)
【讨论】:
以上是关于Pig UDF 抛出错误 Caught error from UDF: GetCounty, Out of bounds access [1]的主要内容,如果未能解决你的问题,请参考以下文章
Java中的Pig UDF:错误---错误1066:无法打开别名的迭代器
在 Pig Latin 中加载 UDF 时发生 ClassCastException 错误
在 Jython 的 Pig UDF 中导入外部库时出现错误 1121