在 Cassandra 2.1.7 中检测到错误泄漏

Posted

技术标签:

【中文标题】在 Cassandra 2.1.7 中检测到错误泄漏【英文标题】:Error Leak Detected in Cassandra 2.1.7 【发布时间】:2017-05-15 15:14:10 【问题描述】:

我一直在使用复制因子为 2 的 4 节点 Cassandra 集群,每个节点的 Cassandra 数据大小约为 2.7TB。

3 天前 Cassandra 节点之一崩溃,我试图启动 Cassandra 服务并查看 system.log,我发现多个 CF 中的 Leak Detected 错误-

ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,779 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@565b5b35) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@408212172:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171 was not released before the reference was garbage collected
LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3e2430d) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@554817289:[Memory@[0..4), Memory@[0..18)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,787 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@2ff9f824) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@1142037527:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@35c52c94) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier@603046944:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@14834365) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup@901621352:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171-Index.db was not released before the reference was garbage collected

我阅读了多个关于“检测到泄漏”的链接或博客,有人说这是一个长 GC 问题,然后我把它放在 cassandra-env.sh 文件中的行下面

JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"

在检查 system.log 之后,我在日志中的行下方找到了 -

INFO  [CompactionExecutor:4] 2017-05-12 19:16:16,892 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29601 ms
INFO  [CompactionExecutor:7] 2017-05-12 23:16:16,563 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29604 ms
INFO  [CompactionExecutor:10] 2017-05-13 03:16:16,838 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29875 ms
INFO  [CompactionExecutor:13] 2017-05-13 07:16:16,849 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29891 ms
INFO  [CompactionExecutor:16] 2017-05-13 11:16:16,737 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29779 ms
INFO  [CompactionExecutor:19] 2017-05-13 15:16:16,848 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29889 ms
INFO  [CompactionExecutor:22] 2017-05-13 19:16:17,009 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29729 ms
INFO  [CompactionExecutor:25] 2017-05-13 23:16:16,476 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29514 ms
INFO  [CompactionExecutor:28] 2017-05-14 03:16:16,648 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29685 ms
INFO  [CompactionExecutor:31] 2017-05-14 07:16:16,724 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29760 ms
INFO  [CompactionExecutor:34] 2017-05-14 11:16:16,709 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29715 ms
INFO  [CompactionExecutor:37] 2017-05-14 15:16:16,515 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO  [CompactionExecutor:40] 2017-05-14 19:16:16,745 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29776 ms
INFO  [CompactionExecutor:43] 2017-05-14 23:16:16,504 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29532 ms
INFO  [CompactionExecutor:46] 2017-05-15 03:16:16,470 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29496 ms
INFO  [CompactionExecutor:49] 2017-05-15 07:16:16,519 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO  [CompactionExecutor:52] 2017-05-15 11:16:16,385 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29411 ms

3 天后,Cassandra 服务未启动。请帮我解决这个问题。

系统信息 -

Cassandra Version = 2.1.7
OS  =  Ubuntu 12.04
CPU Core = 4
RAM = 28GB

【问题讨论】:

【参考方案1】:

2.1.7 已经很老了,这可能是 2.1.9 (https://issues.apache.org/jira/browse/CASSANDRA-9998) 中修复的已知问题。虽然它本身是无害的,但 2.1.7 中有许多您可能不想遇到的错误 - 您应该考虑升级到最新的 2.1 版本 (2.1.17)。

关于 CompactionExecutor / AutoSavingCache 的消息并不表示存在问题 - 这表示您的缓存(其中包含 915k 项)正在定期保存到磁盘,这通常表明您的服务器运行正常.

简而言之,这一切都不是导致您的 Cassandra 服务器停止服务请求的问题。如果您的服务器没有正常运行,则可能有其他事情发生。

【讨论】:

我已经尝试过 Cassandra 2.1.9 版本,但我会遇到同样的问题,Cassandra 节点在启动过程中堆叠在同一点,端口 9042 9160 和 7000 未启动。

以上是关于在 Cassandra 2.1.7 中检测到错误泄漏的主要内容,如果未能解决你的问题,请参考以下文章

美团外卖服务端的测试面试题居然泄……泄……泄……泄密了

美团外卖服务端的测试面试题居然泄……泄……泄……泄密了

利用Kafka和Cassandra构建实时异常检测实验

多节点 cassandra 集群:连接错误:('无法连接到任何服务器')

在 cassandra.yaml 中获取超过 max_value_size_in_mb 的值的错误

GaussDB (for Cassandra) 数据库治理:大key与热key问题的检测与解决