获取 java.lang.OutOfMemoryError: Java heap space while running twitter connector using flume
Posted
技术标签:
【中文标题】获取 java.lang.OutOfMemoryError: Java heap space while running twitter connector using flume【英文标题】:Getting java.lang.OutOfMemoryError: Java heap space while running twitter connector using flume 【发布时间】:2018-05-15 11:54:36 【问题描述】:我正在使用这个命令启动代理
bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf Dflume.root.logger=DEBUG,console -n TwitterAgent
我的错误信息是
Exception in thread "Twitter Stream consumer-1[Receiving stream]" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596)
at java.lang.StringBuffer.append(StringBuffer.java:367)
at java.io.BufferedReader.readLine(BufferedReader.java:370)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at twitter4j.StatusStreamBase.handleNextElement(StatusStreamBase.java:85)
at twitter4j.StatusStreamImpl.next(StatusStreamImpl.java:57)
at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:478)
18/05/15 16:53:36 ERROR hdfs.HDFSEventSink: process failed
java.lang.OutOfMemoryError: GC overhead limit exceeded
下面给出的我的 twitter.conf 属性文件
# Naming the components on the current agent.
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Describing/Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx
TwitterAgent.sources.Twitter.keywords = kafka
# Describing/Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:7200/topics/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000
# Describing/Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
我的 flume-env.sh 文件
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/bin/java
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
# Let Flume write raw event data and configuration information to its log files for debugging
# purposes. Enabling these flags is not recommended in production,
# as it may result in logging sensitive user information or encryption secrets.
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "
# Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""
# export HIVE_HOME=/usr/lib/hive
# export HCAT_HOME=/usr/lib/hive-hcatalog
我在 flume-env.sh 文件中更新了 export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote",但仍然遇到 java 堆内存问题。我应该怎么做才能进一步解决这个问题。
【问题讨论】:
【参考方案1】:我通过在 twitter.conf 文件中添加这一行解决了这个增加堆大小的问题
TwitterAgent.channels.MemChannel.byteCapacity = 6912212
然后增加flume-env.sh文件中的堆大小
export JAVA_OPTS="-Xms512m -Xmx1024m -Dcom.sun.management.jmxremote"
【讨论】:
【参考方案2】:在您的flume-env.sh
文件中注释掉这一行
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
然后启动代理
【讨论】:
在我评论那行之后我也遇到了同样的问题 线程“Twitter4J Async Dispatcher[0]”中的异常 java.lang.OutOfMemoryError:超出 GC 开销限制 线程“Twitter Stream consumer-1[Receiving stream]”中的异常 java.lang.OutOfMemoryError: Java heap space以上是关于获取 java.lang.OutOfMemoryError: Java heap space while running twitter connector using flume的主要内容,如果未能解决你的问题,请参考以下文章