hadoop集群日常维护中遇到的一些问题汇总
Posted 魏大宾
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hadoop集群日常维护中遇到的一些问题汇总相关的知识,希望对你有一定的参考价值。
Connection reset by peer
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstreamUnprotected(BlockReceiver.java:1396)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.sendAckUpstream(BlockReceiver.java:1335)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1256)
at java.lang.Thread.run(Thread.java:745)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
datanode重置链接 The client is stuck in an RPC to NameNode. Currently RPCs can be wait for a long time if the server is busy.
可以通过修改下面几个参数来优化
dfs.namenode.handler.count(加大) NN的服务线程数。用于处理RPC请求
dfs.namenode.replication.interval(减小) NN周期性计算DN的副本情况的频率,秒
dfs.client.failover.connection.retries(建议加大) 专家设置。IPC客户端失败重试次数。在网络不稳定时建议加大此值
dfs.client.failover.connection.retries.on.timeouts(网络不稳定建议加大)专家设置。IPC客户端失败重试次数,此失败仅指超时失败。在网络不稳定时建议加大此值
参考资料:https://issues.apache.org/jira/browse/HADOOP-3657
以上是关于hadoop集群日常维护中遇到的一些问题汇总的主要内容,如果未能解决你的问题,请参考以下文章