zookeeper集群崩溃处理

Posted Federico

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了zookeeper集群崩溃处理相关的知识,希望对你有一定的参考价值。

今天在私有化项目中遇到如下问题:

1.客户反馈用户登录返回303

2.登录服务器查看是大量的log将服务器磁盘空间占用殆尽,导致所有服务进程仍旧存在但是监听端口失败,服务不可用

3.清理日志文件

4.日志文件清理完成后,重启服务,重启zookeeper服务时出现以下报错

2017-07-12 10:52:39,171 [myid:] - INFO [main:[email protected]] - Reading configuration from: /data/apps/config/zookeeper/zoo.cfg
2017-07-12 10:52:39,176 [myid:] - INFO [main:[email protected]] - Defaulting to majority quorums
2017-07-12 10:52:39,180 [myid:2] - INFO [main:[email protected]] - autopurge.snapRetainCount set to 5
2017-07-12 10:52:39,180 [myid:2] - INFO [main:[email protected]] - autopurge.purgeInterval set to 24
2017-07-12 10:52:39,183 [myid:2] - INFO [PurgeTask:[email protected]] - Purge task started.
2017-07-12 10:52:39,194 [myid:2] - INFO [main:[email protected]] - Starting quorum peer
2017-07-12 10:52:39,196 [myid:2] - INFO [PurgeTask:[email protected]] - Purge task completed.
2017-07-12 10:52:39,206 [myid:2] - INFO [main:[email protected]] - binding to port 0.0.0.0/0.0.0.0:2181
2017-07-12 10:52:39,218 [myid:2] - INFO [main:[email protected]] - tickTime set to 2000
2017-07-12 10:52:39,218 [myid:2] - INFO [main:[email protected]] - minSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:[email protected]] - maxSessionTimeout set to -1
2017-07-12 10:52:39,218 [myid:2] - INFO [main:[email protected]] - initLimit set to 10
2017-07-12 10:52:39,230 [myid:2] - INFO [main:[email protected]] - Reading snapshot /data/apps/data/zookeeper/version-2/snapshot.60000888d
2017-07-12 10:52:39,341 [myid:2] - ERROR [main:[email protected]] - Last transaction was partial.
2017-07-12 10:52:39,342 [myid:2] - ERROR [main:QuorumPeer[email protected]] - Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:547)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:522)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:354)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:450)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:440)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:153)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2017-07-12 10:52:39,345 [myid:2] - ERROR [main:[email protected]] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server

经查阅资料得知,造成zookeeper崩溃的原因是

zookeeper呈现给使用某些状态的所有客户端进程一致性的状态视图。当一个客户端从zookeeper获得响应时,客户端可以非常肯定这个响应信息与其他响应信息或其他客户端所接收的响应均保持一致。有时,zookeeper客户端库与zookeeper服务的连接会丢失,而且服务提供一致性保证信息,当客户端发现自己处于这种状态时就会返回这种状态。

 

解决方法:

1.查看zookeeper的配置文件,找到数据的存放目录

cat /etc/zookeeper/conf/zoo.cfg

2.删除或重命名数据配置文件

cd /var/lib/zookeeper

mv ./version-2 ./version-2.bak

3.重新启动zookeeper,查看进程以及端口号是否被监听。

以上是关于zookeeper集群崩溃处理的主要内容,如果未能解决你的问题,请参考以下文章

Zookeeper的ZAB协议

(01)Zookeeper简介

Zookeeper集群

Zookeeper集群

Zookeeper集群

Zookeeper集群