使用shell脚本的Hadoop流:reducer因错误而失败:没有这样的文件或目录

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用shell脚本的Hadoop流:reducer因错误而失败:没有这样的文件或目录相关的知识,希望对你有一定的参考价值。

我正在使用10节点HDP集群,我试图在Bash上使用shell脚本运行一个简单的WordCount作业.Below是我正在使用的Commmand行参数。

    yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar 
    -mapper 'wc -l' 
    -reducer './reducer_wordcount.sh' 
    -file /home/pathirippilly/map_reduce_jobs/shell_scripts/reducer_wordcount.sh 
    -numReduceTasks 1 
    -input /user/pathirippilly/cards/smalldeck.txt 
    -output /user/pathirippilly/mapreduce_jobs/output_shell
  1. 这里的reducer_wordcount.sh是reducer shell脚本,可以在我的本地目录/ home / pathirippilly / map_reduce_jobs / shell_scripts中找到
  2. smalldeck.txt是hadoop目录/ user / pathirippilly / cards上的输入文件
  3. / user / pathirippilly / mapreduce_jobs / output_shell是输出目录
  4. 我使用的hadoop版本是Hadoop 2.7.3.2.6.5.0-292
  5. 我正在运行上面的地图减少纱线模式的工作

reducer_wordcount.sh有:

    #! /user/bin/env bash
    awk '{line_count += $1} END  { print line_count }'

当我在我的集​​群上运行它时,我收到了reducer_wordcount.sh的错误

    Error: java.lang.RuntimeException: Error in configuring object
            at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
            at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
            at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
            at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:410)
            at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
            at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
    Caused by: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
            ... 9 more
    Caused by: java.lang.RuntimeException: configuration exception
            at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
            at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
            ... 14 more
    Caused by: java.io.IOException: Cannot run program "/hdp01/hadoop/yarn/local/usercache/pathirippilly/appcache/application_1533622723243_17238/container_e38_1533622723243_17238_01_000004/./reducer_wordcount.sh": error=2, No such file or directory
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
            at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
            ... 15 more
    Caused by: java.io.IOException: error=2, No such file or directory
            at java.lang.UNIXProcess.forkAndExec(Native Method)
            at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
            at java.lang.ProcessImpl.start(ProcessImpl.java:134)
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)

如果我像下面的命令行命令一样直接运行相同的reducer脚本,它可以工作

    yarn jar /usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming.jar 
    -mapper 'wc -l' 
    -reducer "awk '{line_count += $1} END  { print line_count }'" 
    -numReduceTasks 1 
    -input /user/pathirippilly/cards/smalldeck.txt 
    -output /user/pathirippilly/mapreduce_jobs/output_shell

期待在这里伸出援手,我对hadoop流媒体很新。完整错误堆栈如下:

    18/09/09 10:10:02 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
    packageJobJar: [reducer_wordcount.sh] [/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-streaming-2.7.3.2.6.5.0-292.jar] /var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir/streamjob8506373101127930734.jar tmpDir=null
    18/09/09 10:10:03 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
    18/09/09 10:10:03 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
    18/09/09 10:10:03 INFO client.RMProxy: Connecting to ResourceManager at rm01.itversity.com/172.16.1.106:8050
    18/09/09 10:10:03 INFO client.AHSProxy: Connecting to Application History server at rm01.itversity.com/172.16.1.106:10200
    18/09/09 10:10:05 INFO mapred.FileInputFormat: Total input paths to process : 1
    18/09/09 10:10:06 INFO mapreduce.JobSubmitter: number of splits:2
    18/09/09 10:10:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533622723243_17238
    18/09/09 10:10:08 INFO impl.YarnClientImpl: Submitted application application_1533622723243_17238
    18/09/09 10:10:08 INFO mapreduce.Job: The url to track the job: http://rm01.itversity.com:19288/proxy/application_1533622723243_17238/
    18/09/09 10:10:08 INFO mapreduce.Job: Running job: job_1533622723243_17238
    18/09/09 10:10:14 INFO mapreduce.Job: Job job_1533622723243_17238 running in uber mode : false
    18/09/09 10:10:14 INFO mapreduce.Job:  map 0% reduce 0%
    18/09/09 10:10:19 INFO mapreduce.Job:  map 100% reduce 0%
    18/09/09 10:10:23 INFO mapreduce.Job: Task Id : attempt_1533622723243_17238_r_000000_0, Status : FAILED
    Error: java.lang.RuntimeException: Error in configuring object
            at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
            at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
            at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
            at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:410)
            at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
            at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
            at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
    Caused by: java.lang.reflect.InvocationTargetException
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:498)
            at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
            ... 9 more
    Caused by: java.lang.RuntimeException: configuration exception
            at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222)
            at org.apache.hadoop.streaming.PipeReducer.configure(PipeReducer.java:67)
            ... 14 more
    Caused by: java.io.IOException: Cannot run program "/hdp01/hadoop/yarn/local/usercache/pathirippilly/appcache/application_1533622723243_17238/container_e38_1533622723243_17238_01_000004/./reducer_wordcount.sh": error=2, No such file or directory
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
            at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209)
            ... 15 more
    Caused by: java.io.IOException: error=2, No such file or directory
            at java.lang.UNIXProcess.forkAndExec(Native Method)
            at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
            at java.lang.ProcessImpl.start(ProcessImpl.java:134)
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
            ... 16 more
答案

请参阅Making files available for tasksPackaging files for job submission

基本上,您只需要脚本的文件名,而不是路径

-reducer 'reducer_wordcount.sh' -file /local/path/to/reducer_wordcount.sh

确保该文件是可执行的

 chmod +x /local/path/to/reducer_wordcount.sh

您可以选择使用#标记重命名文件,如链接中所示,但您的本地脚本名称与reducer文件相同,因此没有必要。

你还需要将shebang修复到这个#!/usr/bin/env bash

(顺便说一句,你的mapper和reducer正在做同样的事情,计算行数,不一定是“单词”)

以上是关于使用shell脚本的Hadoop流:reducer因错误而失败:没有这样的文件或目录的主要内容,如果未能解决你的问题,请参考以下文章

hadoop 流确保每个 reducer 一个键

Map Reduce和流处理

Hadoop:如何将 reducer 输出合并到单个文件中? [复制]

如何在 Hadoop 流中设置每个节点的最大减速器数量?

Hadoop Shell 介绍

Shell脚本实现MapReduce统计单词数程序