hadoop集群日常维护中遇到的一些问题汇总

Posted 魏大宾

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hadoop集群日常维护中遇到的一些问题汇总相关的知识,希望对你有一定的参考价值。

 Connection reset by peer

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)

        at sun.nio.ch.IOUtil.write(IOUtil.java:65)

        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)

        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)

        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

        at java.io.DataOutputStream.flush(DataOutputStream.java:123)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1396)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1335)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1256)

        at java.lang.Thread.run(Thread.java:745)

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)

        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)

        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)

        at sun.nio.ch.IOUtil.write(IOUtil.java:65)

        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)

        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)

        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)

datanode重置链接   The client is stuck in an RPC to NameNode. Currently RPCs can be wait for a long time if the server is busy.  

可以通过修改下面几个参数来优化

dfs.namenode.handler.count(加大)  NN的服务线程数。用于处理RPC请求

dfs.namenode.replication.interval(减小)  NN周期性计算DN的副本情况的频率,秒

dfs.client.failover.connection.retries(建议加大)  专家设置。IPC客户端失败重试次数。在网络不稳定时建议加大此值

dfs.client.failover.connection.retries.on.timeouts(网络不稳定建议加大)专家设置。IPC客户端失败重试次数,此失败仅指超时失败。在网络不稳定时建议加大此值

参考资料:https://issues.apache.org/jira/browse/HADOOP-3657

以上是关于hadoop集群日常维护中遇到的一些问题汇总的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop集群管理--保证集群平稳地执行

在搭建Hadoop集群环境时遇到的一些问题

hadoop 日常问题汇总(持续更新)

Hadoop集群 Hbase搭建

Hadoop知识汇总

Hadoop深度运维:Apache集群原地升级Ambari-HDP