Cassandra集群管理-节点异常重启

Posted 2022-04-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Cassandra集群管理-节点异常重启相关的知识，希望对你有一定的参考价值。

登陆一台集群节点，直接重启服务器(172.20.101.166)，设置了 cassandra 开机启动。

注意:

本文档只是体系文档中的一部分，前面文档信息详见：
测试准备+下线正常节点：https://blog.51cto.com/michaelkang/2419518
节点异常重启：https://blog.51cto.com/michaelkang/2419524
添加新节点：https://blog.51cto.com/michaelkang/2419521
删除异常节点：https://blog.51cto.com/michaelkang/2419525

场景：

节点被异常重启，对集群引发的反应。

cassandra.log 基本没有输出

tailf /var/log/cassandra/cassandra.log

system.log

有明显日志报 172.20.101.166 DOWN ！！！

172.20.101.165 节点：

[[email protected] lib]# tailf /var/log/cassandra/system.log 
INFO  [GossipStage:1] 2019-07-11 18:19:23,372 Gossiper.java:1026 - InetAddress /172.20.101.166 is now DOWN

查看异常节点

[[email protected] ~]# nodetool describecluster
Cluster Information:
        Name: pttest
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
                cfce5a85-19c8-327a-ab19-e1faae2358f7: [172.20.101.164, 172.20.101.165, 172.20.101.167, 172.20.101.160, 172.20.101.157]

                UNREACHABLE: [172.20.101.166]

debug.log

大量报无法连接 172.20.101.166

172.20.101.164 节点：

tailf /var/log/cassandra/debug.log

DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:205 - Enqueuing socket close for /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Small] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:411 - Socket to /172.20.101.166 closed
DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:205 - Enqueuing socket close for /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07-11 18:19:23,374 OutboundTcpConnection.java:411 - Socket to /172.20.101.166 closed
DEBUG [GossipStage:1] 2019-07-11 18:19:23,374 FailureDetector.java:313 - Forcing conviction of /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07-11 18:19:24,740 OutboundTcpConnection.java:425 - Attempting to connect to /172.20.101.166
INFO  [HANDSHAKE-/172.20.101.166] 2019-07-11 18:19:24,741 OutboundTcpConnection.java:561 - Handshaking version with /172.20.101.166
DEBUG [MessagingService-Outgoing-/172.20.101.166-Gossip] 2019-07-11 18:19:24,742 OutboundTcpConnection.java:533 - Done connecting to /172.20.101.166

验证查询

系统启动后，服务自然启动，能正常加入集群。

[email protected]> SELECT * from kevin_test.t_users; 

 user_id | emails                          | first_name | last_name
---------+---------------------------------+------------+-----------
       6 | ‘[email protected]‘, ‘[email protected]‘ |     kevin6 |      kang
       7 | ‘[email protected]‘, ‘[email protected]‘ |     kevin7 |      kang
       9 | ‘[email protected]‘, ‘[email protected]‘ |     kevin9 |      kang
       4 | ‘[email protected]‘, ‘[email protected]‘ |     kevin4 |      kang
       3 | ‘[email protected]‘, ‘[email protected]‘ |     kevin3 |      kang
       5 | ‘[email protected]‘, ‘[email protected]‘ |     kevin5 |      kang
       0 | ‘[email protected]‘, ‘[email protected]‘ |     kevin0 |      kang
       8 | ‘[email protected]‘, ‘[email protected]‘ |     kevin8 |      kang
       2 | ‘[email protected]‘, ‘[email protected]‘ |     kevin2 |      kang
       1 | ‘[email protected]‘, ‘[email protected]‘ |     kevin1 |      kang

测试结果：

反复重启节点，查询表内容正常。

以上是关于Cassandra集群管理-节点异常重启的主要内容，如果未能解决你的问题，请参考以下文章

Nutanix集群的Cassandra服务