通过 Apache Flume 将日志文件从本地文件系统移动到 HDFS 时出错

Posted

技术标签:

【中文标题】通过 Apache Flume 将日志文件从本地文件系统移动到 HDFS 时出错【英文标题】:Error in moving log files from local file system to HDFS via Apache Flume 【发布时间】:2022-01-11 18:06:23 【问题描述】:

我的本​​地文件系统中有日志文件,需要通过 Apache Flume 传输到 HDFS。我在主目录中有以下配置文件保存为 net.conf

NetcatAgent.sources = Netcat
NetcatAgent.channels = MemChannel
NetcatAgent.sinks= LoggerSink

# configuring source
NetcatAgent.sources.Netcat.type = netcat
    #type of conection is netcat
NetcatAgent.sources.Netcat.bind = localhost
    # bind to localhost
NetcatAgent.sources.Netcat.port=9999
    # localhost port number


# configuring sink
NetcatAgent.sinks.LoggerSink.type = logger
    #logger sends output to console

# Configuring Channel
NetcatAgent.channels.MemChannel.type = memory   
    #defines type of memory it is storing
NetcatAgent.channels.MemChannel.capacity = 10000   
    #how many events can be present
NetcatAgent.channels.MemChannel.transactionCapacity = 1000  
    #how many events it can handle at a time

# bind source and sink to channel
NetcatAgent.sources.Netcat.channels = MemChannel
NetcatAgent.sinks.LoggerSink.channel = MemChannel



#to run the file on console 
#flume-ng agent -n NetcatAgent -f net.conf

#on other terminal establish connection using
#telnet localhost 9999

在主目录本身运行命令后flume-ng agent -n NetcatAgent -f net.conf

我得到以下输出:

Warning: No configuration directory set! Use --conf <dir> to override.
Info: Including Hadoop libraries found via (/home/samar/hadoop-3.3.1/bin/hadoop) for HDFS access
Info: Including Hive libraries found via () for Hive access
+ exec /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx20m -cp '/home/samar/flume/lib/*:/home/samar/hadoop-3.3.1/etc/hadoop:/home/samar/hadoop-3.3.1/share/hadoop/common/lib/*:/home/samar/hadoop-3.3.1/share/hadoop/common/*:/home/samar/hadoop-3.3.1/share/hadoop/hdfs:/home/samar/hadoop-3.3.1/share/hadoop/hdfs/lib/*:/home/samar/hadoop-3.3.1/share/hadoop/hdfs/*:/home/samar/hadoop-3.3.1/share/hadoop/mapreduce/*:/home/samar/hadoop-3.3.1/share/hadoop/yarn:/home/samar/hadoop-3.3.1/share/hadoop/yarn/lib/*:/home/samar/hadoop-3.3.1/share/hadoop/yarn/*:/lib/*' -Djava.library.path=:/home/samar/hadoop-3.3.1/lib/native org.apache.flume.node.Application -n NetcatAgent -f net.conf
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/samar/flume/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/samar/hadoop-3.3.1/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.Arrays.copyOf(Arrays.java:3745)
    at java.base/jdk.internal.loader.Resource.getBytes(Resource.java:117)
    at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:797)
    at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    at com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:194)
    at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:114)
    at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:49)
    at com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:156)
    at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:214)
    at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:201)
    at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)
    at com.google.common.collect.HashMultimap.put(HashMultimap.java:49)
    at com.google.common.eventbus.AnnotatedHandlerFinder.findAllHandlers(AnnotatedHandlerFinder.java:57)
    at com.google.common.eventbus.EventBus.register(EventBus.java:211)
    at org.apache.flume.node.Application.main(Application.java:355)

我已经编辑了flume-env.sh 文件,但问题仍然存在。

为了这个任务。

【问题讨论】:

【参考方案1】:

以下异常意味着水槽代理没有足够的内存(具体是堆)来执行任务。

flume_env.sh文件中增加flume代理的java内存或在部署时使用flume-ng agent -n NetcatAgent -f net.conf -Xmx2048m指定内存(注意:这将flume堆大小设置为2GB = 2048MB)

您可以从命令行指定 -D 和 -X java 选项。

在flume目录中,进入conf目录,应该有flume-env.sh或者flume-env.sh.template文件,如果有.template文件的话复制文件使用

cp flume-env.sh.template flume-env.sh

完成后,打开flume-env.sh 文件并添加以下行

export JAVA_OPTS="-Xms1G -Xmx2G"

保存文件然后运行flume-agent,flume代理会自动获取JAVA_OPTS变量并应用堆大小。

注意:-Xms1G 表示分配 1GB 的最小堆,-Xmx 表示分配 2GB 的最大堆。根据您的需要进行更改。

【讨论】:

如何编辑flume_env.sh 文件? @SamarPratapSingh 更新了关于如何编辑 flume-env.sh 的答案 我添加了我的文件截图,请验证。 好吧,您似乎正在将一个大型数据集读入内存。请问您尝试从 netcat 读取的数据大小是多少? flume-env.sh 好像没问题。

以上是关于通过 Apache Flume 将日志文件从本地文件系统移动到 HDFS 时出错的主要内容,如果未能解决你的问题,请参考以下文章

使用 Apache Flume 收集 CPU 时间日志

Apache Flume 安装文档日志收集

在写入时使用 Flume 将日志文件摄取到 HDFS

Flume+Kafka双剑合璧玩转大数据平台日志采集

日志收集系统之Apache Flume

Flume初步学习