Windows 上的 Apache Pig 设置错误

Posted

技术标签:

【中文标题】Windows 上的 Apache Pig 设置错误【英文标题】:Apache Pig on Windows set-up error 【发布时间】:2015-07-15 11:11:51 【问题描述】:

我正在尝试在 Windows 系统上安装和运行 Apache Pig 0.15.0,但没有成功。我打算将它用于我的 Apache Hadoop 2.7.1。

上下文 我遵循了基本教程Getting Started,“下载猪”部分。我下载了“pig-0.15.0”并设置了 Pig 的路径。

我可以输入“grunt”,但是当我尝试运行一个简单的脚本时,例如:

logs = LOAD 'PigInput/logs' USING PigStorage(';');
STORE logs INTO 'logs-output.txt';

它给了我以下错误:错误

2015-07-15 12:54:27,157 [main] WARN  org.apache.pig.backend.hadoop20.PigJobControl - falling back to default JobControl (not using hadoop 0.20 ?)
java.lang.NoSuchFieldException: runnerState
        at java.lang.Class.getDeclaredField(Class.java:1953)
        at org.apache.pig.backend.hadoop20.PigJobControl.<clinit>(PigJobControl.java:51)
        at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.newJobControl(HadoopShims.java:109)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:314)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:196)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:304)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
        at org.apache.pig.PigServer.execute(PigServer.java:1364)
        at org.apache.pig.PigServer.access$500(PigServer.java:113)
        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1082)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:505)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
        at org.apache.pig.Main.run(Main.java:565)
        at org.apache.pig.Main.main(Main.java:177)
2015-07-15 12:54:27,165 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-07-15 12:54:27,186 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Inst
ead, use mapreduce.reduce.markreset.buffer.percent
2015-07-15 12:54:27,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buf
fer.percent is not set, set to default 0.3
2015-07-15 12:54:27,188 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.o
utput.fileoutputformat.compress
2015-07-15 12:54:27,190 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted ru
n in-process
2015-07-15 12:54:27,585 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/p
ig-0.15.0-core-h1.jar to DistributedCache through /tmp/temp27293389/tmp1227477167/pig-0.15.0-core-h1.jar
2015-07-15 12:54:27,627 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/automaton-1.11-8.jar to DistributedCache through /tmp/temp27293389/tmp-1342585295/automaton-1.11-8.jar
2015-07-15 12:54:27,664 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp27293389/tmp-510663803/antlr-runtime-3.4.jar
2015-07-15 12:54:27,769 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/hadoop-2.7.1
/share/hadoop/common/lib/guava-11.0.2.jar to DistributedCache through /tmp/temp27293389/tmp-1466437686/guava-11.0.2.jar
2015-07-15 12:54:27,817 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/C:/pig-0.15.0/l
ib/joda-time-2.5.jar to DistributedCache through /tmp/temp27293389/tmp672491704/joda-time-2.5.jar
2015-07-15 12:54:27,905 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-07-15 12:54:27,959 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for
submission.
2015-07-15 12:54:27,969 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use ma
preduce.jobtracker.http.address
2015-07-15 12:54:27,979 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-07-15 12:54:27,989 [JobControl] ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while trying to run jobs.
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:235)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:183)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
        at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
2015-07-15 12:54:28,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-07-15 12:54:28,014 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Spec
ify -stop_on_failure if you want Pig to stop immediately on failure.
2015-07-15 12:54:28,016 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job null has failed! Stop runnin
g all dependent jobs
2015-07-15 12:54:28,017 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-07-15 12:54:28,025 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2015-07-15 12:54:28,027 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.7.1   0.15.0  Administrator   2015-07-15 12:54:27     2015-07-15 12:54:28     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
N/A     logs    MAP_ONLY        Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.ma
preduce.JobContext, but class was expected
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:235)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:183)
        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
        at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
        at org.apache.pig.backend.hadoop20.PigJobControl.run(PigJobControl.java:121)
        at java.lang.Thread.run(Thread.java:745)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
        hdfs://localhost:9000/user/Administrator/logs-output.txt,

我的尝试 1.我尝试下载“pig-0.15.0-src”并尝试构建它:

ant -Dhadoopversion=23

我收到以下错误(在这期间我还向我的“build.xml”添加了代理设置):

C:\pig-0.15.0-src>ant -Dhadoopversion=23
Buildfile: C:\pig-0.15.0-src\build.xml

ivy-download:
      [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar
      [get] To: C:\pig-0.15.0-src\ivy\ivy-2.2.0.jar
      [get] Not modified - so not downloaded

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:
[ivy:configure] :: Ivy 2.2.0 - 20100923230623 :: http://ant.apache.org/ivy/ ::
[ivy:configure] :: loading settings :: file = C:\pig-0.15.0-src\ivy\ivysettings.xml

ivy-resolve:
[ivy:resolve]
[ivy:resolve] :: problems summary ::
[ivy:resolve] :::: WARNINGS
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]           ::          UNRESOLVED DEPENDENCIES         ::
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]           :: org.antlr#antlr;3.4: configuration not found in org.antlr#antlr;3.4: 'master'. It was required from org.apache.pig#pig;0.15
.0-SNAPSHOT compile
[ivy:resolve]           ::::::::::::::::::::::::::::::::::::::::::::::
[ivy:resolve]
[ivy:resolve]
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS

BUILD FAILED
C:\pig-0.15.0-src\build.xml:1662: impossible to resolve dependencies:
        resolve failed - see output for details
    我已经从Maven 下载了jar:org.apache.pig pig 0.15.0 h2.jar。我不知道这是否有帮助。我也不知道放在哪里。

更多细节 一开始,当我运行“pig”时,它给了我找不到“hadoop-config.cmd”的路径。为了使它工作,我更改了以下行“pig/bin/pig.cmd”:

设置 hadoop-config-script=C:\hadoop-2.7.1\libexec\hadoop-config.cmd

其他人,类似问题 我看到了类似的问题:here 和其他人。他们中的大多数人建议运行以下内容:

ant clean jar-withouthadoop -Dhadoopversion=23

...这充其量只会让我陷入其他错误。

帮助 我需要帮助让我的 Apache Pig 运行命令和 MapReduce 作业。我该怎么办?

更新 1 正如@Fred 在 cmets 中推荐的那样,我已经尝试并成功地让 pig 0.12.0 运行作业没有喧嚣(我只设置路径,没有构建等),除了在 Internet 上找到这个版本:cloudera pig 0.12.0 .

不过,我还是想找到一个拥有最新版本 Apache Pig 的解决方案。

【问题讨论】:

虽然 Pig 可能会在 Windows 上以某种方式工作,但我建议使用 VM。这可能吗?还是您需要继续使用 Windows? 我要求是 Windows。一切都应该在 Windows 上 :) 在这种情况下,我会尝试 pig 0.12,它被明确表示在 Windows 上运行:issues.apache.org/jira/browse/PIG-2793 和 hortonworks.com/blog/announcing-apache-pig-0-12 我强烈建议,正如@Fred 在他的 cmets 中所说,在 Linux 环境中的 VM 上执行此操作。它将为您省去 的麻烦(Windows 是一场灾难),如果您正在尝试学习 Hadoop 生态系统,它可以让您体验如何在“现实世界”中使用它。 【参考方案1】:

有两个来自同一来源的猪罐。 如果您使用的是 hadoop > 2.3。包括

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <classifier>h2</classifier>
    <version>0.15.0</version>
</dependency>

而不是

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <version>0.15.0</version>
</dependency>

作为依赖并尝试。

0.15.0 h2.jar 是使用 ant -Dhadoopversion=23 jar 构建的。

【讨论】:

以上是关于Windows 上的 Apache Pig 设置错误的主要内容,如果未能解决你的问题,请参考以下文章

Hue 上的 Apache Pig 0.12.0 未按预期预处理语句

pig 示例 apache [输入路径不存在]

Apache Windows配置-----跨域CORS设置-----HTTP响应头设置-------dplayer调用apache服务时的header filed range报错解决

Apache Windows配置-----跨域CORS设置-----HTTP响应头设置-------dplayer调用apache服务时的header filed range报错解决

Pig 0.12.0 在 Windows 2008 r2 x64 上的 Hadoop 2.3.0

错误包 org.apache.pig.FilterFunc 不存在