Pig Elephant-Bird 找到接口 org.apache.hadoop.mapreduce.JobContext,但是应该有类

Posted

技术标签:

【中文标题】Pig Elephant-Bird 找到接口 org.apache.hadoop.mapreduce.JobContext,但是应该有类【英文标题】:Pig Elephant-Bird Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected 【发布时间】:2013-08-13 21:02:31 【问题描述】:

我正在使用 CDH4 运行 Hadoop 2.0,并使用 Oracle Java 1.6 r31 构建了大象鸟库

我的猪脚本:

register elephant-bird-2.2.3.jar

log = load 'loggy.log.lzo' using com.twitter.elephantbird.pig.store.LzoPigStorage(' ');

limited = limit log 100;

dump limited;

结果:

Pig Stack Trace
---------------
ERROR 2117: Unexpected error when launching map reduce job.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias limited
    at org.apache.pig.PigServer.openIterator(PigServer.java:838)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:475)
    at org.apache.pig.Main.main(Main.java:157)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias limited
    at org.apache.pig.PigServer.storeEx(PigServer.java:937)
    at org.apache.pig.PigServer.store(PigServer.java:900)
    at org.apache.pig.PigServer.openIterator(PigServer.java:813)
    ... 12 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job.
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:352)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1266)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1251)
    at org.apache.pig.PigServer.storeEx(PigServer.java:933)
    ... 14 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.listStatus(LzoInputFormat.java:55)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
    at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.getSplits(LzoInputFormat.java:111)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)

    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:676)
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.listStatus(LzoInputFormat.java:55)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
    at com.twitter.elephantbird.mapreduce.input.LzoInputFormat.getSplits(LzoInputFormat.java:111)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:319)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.startReadyJobs(JobControl.java:239)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:270)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:160)
    at java.lang.Thread.run(Thread.java:662)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:257)
================================================================================

【问题讨论】:

对于在寻找ERROR 1066: Unable to open iterator for alias 时发现此帖子的人,这里是generic solution。 【参考方案1】:

这是由于大象鸟库预期的 Hadoop 版本与已安装的 Hadoop 版本不兼容(this bug report 描述了类似的问题)。最新版本的大象鸟包含解决问题的Hadoop API wrappers。

试试latest version of the elephant-bird library - 你需要三个罐子 - 并在 pig 中注册它们:

register 'elephant-bird-core-4.1.jar';
register 'elephant-bird-pig-4.1.jar';
register 'elephant-bird-hadoop-compat-4.1.jar';

这解决了我的问题。

【讨论】:

以上是关于Pig Elephant-Bird 找到接口 org.apache.hadoop.mapreduce.JobContext,但是应该有类的主要内容,如果未能解决你的问题,请参考以下文章

如何解码来自列的 Pig 中的 JSON?

使用 Apache PIG 读取多行 JSON

Elephant Bird Pig TypeRef ClassNotFoundException

PIG - 找到接口 org.apache.hadoop.mapreduce.JobContext,但预期类

象鸟构建失败

Apache Pig - 具有多个匹配条件的 MATCHES