Zookeeper 服务器在一段时间后关闭

Posted

技术标签:

【中文标题】Zookeeper 服务器在一段时间后关闭【英文标题】:zookeeper server shutdown after some time 【发布时间】:2018-06-12 21:15:25 【问题描述】:

我们有 HDP 集群版本 2.6.4 和 3 个 Zookeeper 服务器版本 3.4.x

第一个 zookeeper 服务器无法正常工作并在一段时间后弯腰

从 ambari GUI 我们可以看到动物园已断开连接

从zookeeper日志中我们可以看到:

java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NioserverCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,856 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
2018-06-12 18:35:01,857 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
        at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
        at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1082)
        at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:391)
        at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)

当我们对 zookeeper 进行测试时,我们得到了:

echo stat | nc 14.42.169 2181

Latency min/avg/max: 0/10/2727
Received: 600879
Sent: 103803
Connections: 30
Outstanding: 546
Zxid: 0x3e000048c3
Mode: follower
Node count: 43296
请注意,send 比我们从 Received 中得到的要少得多!

我们可以看到很多 CLOSE-WAIT 连接

#  ss -anop | grep 2181 | grep CLOSE | awk 'print $1" "$2' | more
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT
tcp CLOSE-WAIT

为了尝试解决此问题,我们执行了以下操作但没有成功

    将 Java 堆大小增加到 8G(仅在 zookeeper 上)

    在 kafka 上增加 zookeeper.session.timeout.ms

但所有这些都对我们没有帮助

请告知造成此问题的原因可能是什么,

【问题讨论】:

提供给集群的内存是什么? 每台机器有32G,但这不是我通过free -g检查的问题 @KingDavid - 您的 Zookeeper 是在虚拟服务器还是物理服务器上运行?如果是虚拟的,那么存储是共享存储还是直接附加存储? 【参考方案1】:

它看起来像一个已修复的错误https://issues.apache.org/jira/browse/ZOOKEEPER-2044

尝试更新您的动物园管理员

【讨论】:

以上是关于Zookeeper 服务器在一段时间后关闭的主要内容,如果未能解决你的问题,请参考以下文章

从 R Studio 运行时,R Shiny 应用程序会在一段时间后自行关闭,但它仍在收听......这是正常的吗?

拟合模型时,内核在一段时间后停止工作

MariaDB 一段时间后自动关闭

干货分享微服务spring-cloud(8.服务治理和配置中心Spring-cloud-zooke)

foxmail 提示 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败.

CoreWcf 服务在一段时间后挂断