Zookeeper 服务器在一段时间后关闭
Posted
技术标签:
【中文标题】Zookeeper 服务器在一段时间后关闭【英文标题】:zookeeper server shutdown after some time 【发布时间】:2018-06-12 21:15:25 【问题描述】:我们有 HDP 集群版本 2.6.4 和 3 个 Zookeeper 服务器版本 3.4.x
第一个 zookeeper 服务器无法正常工作并在一段时间后弯腰
从 ambari GUI 我们可以看到动物园已断开连接
从zookeeper日志中我们可以看到:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NioserverCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,856 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
当我们对 zookeeper 进行测试时,我们得到了:
echo stat | nc 14.42.169 2181
Latency min/avg/max: 0/10/2727
Received: 600879
Sent: 103803
Connections: 30
Outstanding: 546
Zxid: 0x3e000048c3
Mode: follower
Node count: 43296
请注意,send 比我们从 Received 中得到的要少得多!
我们可以看到很多 CLOSE-WAIT 连接
# ss -anop | grep 2181 | grep CLOSE | awk 'print $1" "$2' | more
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
为了尝试解决此问题,我们执行了以下操作但没有成功
将 Java 堆大小增加到 8G(仅在 zookeeper 上)
在 kafka 上增加 zookeeper.session.timeout.ms
但所有这些都对我们没有帮助
请告知造成此问题的原因可能是什么,
【问题讨论】:
提供给集群的内存是什么? 每台机器有32G,但这不是我通过free -g检查的问题 @KingDavid - 您的 Zookeeper 是在虚拟服务器还是物理服务器上运行?如果是虚拟的,那么存储是共享存储还是直接附加存储? 【参考方案1】:它看起来像一个已修复的错误https://issues.apache.org/jira/browse/ZOOKEEPER-2044
尝试更新您的动物园管理员
【讨论】:
以上是关于Zookeeper 服务器在一段时间后关闭的主要内容,如果未能解决你的问题,请参考以下文章
从 R Studio 运行时,R Shiny 应用程序会在一段时间后自行关闭,但它仍在收听......这是正常的吗?
干货分享微服务spring-cloud(8.服务治理和配置中心Spring-cloud-zooke)