Cassandra - JVM OOM直接缓冲区错误
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Cassandra - JVM OOM直接缓冲区错误相关的知识,希望对你有一定的参考价值。
我们有一个具有以下配置的Datastax Enterprise集群:
java version "1.8.0_181"
DataStax Enterprise Version: 6.0.0
Number of Nodes: 3
Node Listing:
Name: localhost - xx.xx.xx.01
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 5 GB / 125 GB
Name: localhost - xx.xx.xx.02
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 5 GB / 125 GB
Name: localhost - xx.xx.xx.03
Cassandra Version: 4.0.0.2284
DataStax Enterprise Version: 6.0.0
Available Memory: 15586 MB
Number of CPU Cores: 4
Operating System: linux
Space Used: 6 GB / 125 GB
Keyspace size - 1.34 GB
昨天,我们在3个节点中的1个节点上发生了很多OOM错误,并且在重新启动第一个节点后,其他节点上出现了类似的错误。错误详情:
ERROR [CompactionExecutor:4477] 2018-08-29 13:23:00,320 JVMStabilityInspector.java:117 - OutOfMemory error letting the JVM handle the error:
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:694) ~[na:1.8.0_181]
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[na:1.8.0_181]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[na:1.8.0_181]
at org.apache.cassandra.io.compress.BufferType$2.allocate(BufferType.java:39) ~[dse-db-all-4.0.0.2284.jar:6.0.0]
at org.apache.cassandra.io.compress.CompressedSequentialWriter.<init>(CompressedSequentialWriter.java:89) ~[dse-db-all-4.0.0.2284.jar:6.0.0]
at org.apache.cassandra.io.sstable.format.trieindex.TrieIndexSSTableWriter.<init>(TrieIndexSSTableWriter.java:100) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.io.sstable.format.trieindex.TrieIndexFormat$WriterFactory.open(TrieIndexFormat.java:110) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.io.sstable.format.SSTableWriter.create(SSTableWriter.java:108) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.switchCompactionLocation(DefaultCompactionWriter.java:71) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.maybeSwitchWriter(CompactionAwareWriter.java:182) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:144) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:210) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:92) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:101) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:310) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_181]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_181]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) ~[dse-db-all-4.0.0.2284.jar:4.0.0.2284]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_181]
此错误似乎与Java.nio直接缓冲区高速缓存有关,它没有上限限制并且一直持续增长,直到发生OOM事件。 (https://support.datastax.com/hc/en-us/articles/360000863663-JVM-OOM-direct-buffer-errors-affected-by-unlimited-java-nio-cache)。我们看到所有Cassandra节点的内存利用率不断提高。即使重新启动Cassandrandra节点后,此行为仍然存在。
JVM配置:
-XX:+AlwaysPreTouch
-Dcassandra.disable_auth_caches_remote_configuration=false
-Dcassandra.expiration_date_overflow_policy="REJECT"
-Dcassandra.force_default_indexing_page_size=false
-Dcassandra.join_ring=true
-Dcassandra.load_ring_state=true
-Dcassandra.write_survey=false
#-XX:ConcGCThreads=
-ea
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:GCLogFileSize=10M
-XX:+HeapDumpOnOutOfMemoryError
#-Xmsauto
#-XX:InitiatingHeapOccupancyPercent=
-Dio.netty.eventLoop.maxPendingTasks=65536
-Djava.net.preferIPv4Stack=true
-XX:MaxGCPauseMillis=500
#-Xmxauto
-XX:NumberOfGCLogFiles=10
-Dsun.nio.PageAlignDirectMemory=true
#-XX:ParallelGCThreads=
-Xss256k
-XX:+PerfDisableSharedMem
-XX:+PreserveFramePointer
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC
-Dcassandra.printHeapHistogramOnOutOfMemoryError=false
-XX:+PrintPromotionFailure
-XX:+PrintTenuringDistribution
-XX:+ResizeTLAB
-XX:-RestrictContended
-XX:StringTableSize=1000003
-XX:ThreadPriorityPolicy=42
-XX:+UnlockDiagnosticVMOptions
-XX:+UseGCLogFileRotation
-XX:+UseThreadPriorities
-XX:+UseTLAB
-XX:+UseG1GC
JVM_ON_OUT_OF_MEMORY_ERROR_OPT="-XX:OnOutOfMemoryError=kill -9 %p"
-Dcom.sun.management.jmxremote.authenticate=false
-Dcassandra.jmx.local.port=7199
答案
当oom来的时候,你能提供关于堆信息的更多信息吗?老根,伊甸园,s1等。
- 检查你的JVM参数,例如:-XX:+ DisableExplicitGC,如果这个args设置为false,那么
System.gc()
什么都不做。 - 你的jvm有fgc吗?检查你的代码是否分配了DirectByteBuffer对象。
以上是关于Cassandra - JVM OOM直接缓冲区错误的主要内容,如果未能解决你的问题,请参考以下文章
JVM技术专题让你完全攻克内存溢出(OOM)这一难题「案例篇」