Hadoop 集群环境中 Mapreduce 作业的连接被拒绝

Posted

技术标签:

【中文标题】Hadoop 集群环境中 Mapreduce 作业的连接被拒绝【英文标题】:Connection refused on a Mapreduce job in a Hadoop cluster enviorment 【发布时间】:2018-05-22 21:33:39 【问题描述】:

我设置了一个 4 节点 Hadoop 集群,其中包含一个主节点和三个数据节点。在我尝试执行 map reduce 作业之前,这一切似乎都运行良好。

Jps(主节点):

[root@master logs]# jps
26967 SecondaryNameNode
25720 JobHistoryServer
26778 NameNode
27115 ResourceManager
27839 Jps

Jps(数据节点):

[root@localhost ~]# jps
21872 DataNode
22257 Jps
21974 NodeManager

master节点上的yarn日志文件给出如下异常:

2018-05-22 21:59:10,376 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1527018750538_0001 failed 2 times due to Error launching appattempt_1527018750538_0001_000002. Got exception: java.net.ConnectException: Call From NameNode/193.198.139.50 to localhost:41227 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1413)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
    at org.apache.hadoop.ipc.Client$Connection.setupiostreams(Client.java:713)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1452)
    ... 15 more
. Failing the application.

据我所知,问题出在 localhost:41227,因为我从未在任何配置文件中指定类似的内容,并且每次尝试运行新的端口号都是新的工作,但显然我不确定。任何建议或帮助表示赞赏。谢谢

核心站点.xml

<configuration>
<!-- core-site.xml -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://NameNode:9000/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>NameNode:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>NameNode:19888</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<!-- hdfs-site.xml -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/usr/local/hadoop_work/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>

yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>NameNode</value>
</property>
<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:/usr/local/hadoop_work/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://NameNode:9000/var/log/hadoop-yarn/apps</value>
</property>
</configuration>

【问题讨论】:

【参考方案1】:

这是Datanodes的主机名问题。 为除 localhost 之外的 Datanodes 提供一个有意义的主机名并重新启动进程。

Call From NameNode/193.198.139.50 to localhost:41227

表示它试图从 Namenode 到达 Datanode(localhost) 的随机端口。每个节点都会监听它的环回 IP(127.0.0.1/localhost)。它应该到达数据节点,但根据您的配置,它正在尝试到达自己的机器。

您也可以发布您的 slaves 文件吗?

【讨论】:

我刚刚设法通过将从属主机名更改为与它们的 hadoop 配置中所述相同的值来解决这个问题。这是正确答案。非常感谢。

以上是关于Hadoop 集群环境中 Mapreduce 作业的连接被拒绝的主要内容,如果未能解决你的问题,请参考以下文章

怎么在hadoop上部署mapreduce

运行 hadoop 流和 mapreduce 作业:PipeMapRed.waitOutputThreads():子进程失败,代码为 127

Hadoop Mapreduce本地调试

Hadoop Mapreduce本地调试

Hadoop Mapreduce的工作流程

如何分布式运行mapreduce程序