使用 apache flume 将数据流式传输到 hbase

Posted

技术标签:

【中文标题】使用 apache flume 将数据流式传输到 hbase【英文标题】:streaming data into hbase using apache flume 【发布时间】:2014-05-12 21:28:35 【问题描述】:

我正在尝试使用 apache flume 将数据加载到 hbase。当我使用flume将数据流式传输到hadoop时,它工作正常。但是当我启动水槽代理将数据加载到 hbase 时,我得到 NoClassDefFoundError。

14/05/12 23:14:10 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:agent4.conf
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Added sinks: sink1 Agent: agent4
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Processing:sink1
14/05/12 23:14:10 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent4]
14/05/12 23:14:10 INFO node.AbstractConfigurationProvider: Creating channels
14/05/12 23:14:10 INFO channel.DefaultChannelFactory: Creating instance of channel channel1 type FILE
14/05/12 23:14:10 INFO node.AbstractConfigurationProvider: Created channel channel1
14/05/12 23:14:10 INFO source.DefaultSourceFactory: Creating instance of source source1, type exec
14/05/12 23:14:10 INFO sink.DefaultSinkFactory: Creating instance of sink: sink1, type: org.apache.flume.sink.hbase.HBaseSink
14/05/12 23:14:10 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
    at org.apache.flume.sink.hbase.HBaseSink.<init>(HBaseSink.java:102)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at java.lang.Class.newInstance(Class.java:374)
    at org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:43)
    at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:415)
    at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
    at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 17 more

这是我的水槽配置:

flume-env.sh

JAVA_HOME=/usr
FLUME_CLASSPATH=/home/alpha/apache-flume-1.4.0-bin/lib
HBASE_CLASSPATH=/home/alpha/hbase-0.98.1/lib
HBASE_HOME=/home/alpha/hbase-0.98.1
FLUME_HOME=/home/alpha/apache-flume-1.4.0-bin

agent4.conf

# Name the components on this agent
agent4.sources = source1
agent4.sinks = sink1
agent4.channels = channel1

# Describe/configure source1
agent4.sources.source1.type = exec
agent4.sources.source1.command = tail -f /tmp/testGenerate.csv

# Describe sink1
agent4.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
agent4.sinks.sink1.table = AdreamLumiHB
agent4.sinks.sink1.columnFamily =lumiCF
agent4.sinks.sink1.batchSize=5000
agent4.sinks.sink1.serializer.regex = ^(\d+),([^,]+),([^,]+),([^,]+),([^,]+),([^,]+),.*
agent4.sinks.sink1.serializer.regexIgnoreCase = true
agent4.sinks.sink1.serializer.colNames = id,nom,valeur,batiment,etage,piece

# Use a channel which buffers events to a file
agent4.channels.channel1.type = FILE 
agent4.channels.channel1.transactionCapacity = 1000000 
agent4.channels.channel1.checkpointInterval 30000
agent4.channels.channel1.maxFileSize = 2146435071
agent4.channels.channel1.capacity 10000000 

# Bind the source and sink to the channel
agent4.sources.source1.channels = channel1
agent4.sinks.sink1.channel = channel1

【问题讨论】:

【参考方案1】:

将 hbase 类路径附加到水槽时会发生什么?

FLUME_CLASSPATH=/home/alpha/apache-flume-1.4.0-bin/lib/\*:/home/alpha/hbase-0.98.1/lib/\*

注意:不要在星号 * 之前包含反斜杠。我把它放在那里是因为星号不会出现在这个编辑器上。

【讨论】:

只是检查:hbase 库是否安装在所有工作节点上 - 并且在属性文件中显示的相同位置? 是的。我试图将 hbase lib 与 flume lib 合并,但出现另一个错误: ERROR hbase.HBaseSink: Could not load table, AdreamLumiHB from HBase java.io.IOException: java.lang.reflect.InvocationTargetException .... 当我使用带有 AsyncHbaseSink 的水槽代理时,我收到此错误:ERROR async.HBaseClient: The znode for the -ROOT- region doesn't exist! 而 hbase 已正确启动。 这些错误看起来更像是配置问题。请确保您将客户端指向正确的 hbase-site.xml 和 hbase-env.sh 我认为我的客户指向正确的 hbase-site.xml 和 hbase-env.sh。当我使用 java 程序时,我可以做 put 和 get.it 运行正常。如果你愿意,我可以发布我的 hbase 配置 您是否可以在 same Java 程序中放入您看到异常的地方?我怀疑您需要将 hbase 配置目录添加到您的类路径中。【参考方案2】:

我建议将所有 jar 从 hbase home 的 /lib 文件夹复制到 flume 的 /lib 文件夹。这帮助我解决了这个问题。

【讨论】:

以上是关于使用 apache flume 将数据流式传输到 hbase的主要内容,如果未能解决你的问题,请参考以下文章

Apache Flume:无法提交事务。达到堆空间限制

Flume学习

Flume从入门到实战

Flume从入门到实战

Flume

Flume概述