Flume系列二之案例实战

Posted 2021-04-30 啊是留歌呀

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Flume系列二之案例实战相关的知识，希望对你有一定的参考价值。

写在前面
通过前面一篇文章http://blog.csdn.net/liuge36/article/details/78589505的介绍我们已经知道flume到底是什么？flume可以用来做什么？但是，具体怎么做，这就是我们这篇文章想要介绍的。话不多说，直接来案例学习。

实战一：实现官网的第一个简单的小案例-从指定端口采集数据输出到控制台

从官网的介绍中，我们知道需要new一个.conf文件，
1.这里我们就在flume的conf文件夹下新建一个test1.conf

2.把官网的A simple example拷贝进去，做简单的修改

[hadoop@hadoop000 conf]$ vim test1.conf 
# Name the components on this agent
 a1.sources = r1
 a1.sinks = k1
 a1.channels = c1
#
# # Describe/configure the source
 a1.sources.r1.type = netcat
 a1.sources.r1.bind = hadoop000
 a1.sources.r1.port = 44444

# # Describe the sink
 a1.sinks.k1.type = logger

# # Use a channel which buffers events in memory
 a1.channels.c1.type = memory

# # Bind the source and sink to the channel
 a1.sources.r1.channels = c1
 a1.sinks.k1.channel = c1

#不修改也是应该没有什么问题的
#:wq保存退出

3.flume的agent启起来之后，就可以开始测试啦：

[hadoop@hadoop000 data]$ telnet hadoop000 44444
Trying 192.168.1.57...
Connected to hadoop000.
Escape character is '^]'.
你好         
OK

这里会发现，刚刚启动的agent界面有输出

到这里，就实现第一个简单的flume案例,很简单是吧

可以看出，使用Flume的关键就是写配置文件
1）配置Source
2）配置Channel
3）配置Sink
4) 把以上三个组件串起来

简单来说，使用flume，就是使用flume的配置文件

实战二：监控一个文件实时采集新增的数据输出到控制台

思路？？
前面说到，做flume就是写配置文件
就面临选型的问题
Agent选型，即source选择什么，channel选择什么，sink选择什么

这里我们选择 exec source memory channel logger sink

怎么写呢？
按照之前说的那样1234步骤

从官网中，我们可以找到我们的选型应该如何书写：
1）配置Source
exec source

Property Name    Default Description
channels    –    
type    –   The component type name, needs to be exec
command    –   The command to execute
shell    –   A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.

从官网的介绍中，我们知道我们的exec source得配置type=exec
,配置自己的command，shell也是建议配置上的，其余的配置就不用配置了。是不是很简单。我们这里自己的配置就如下：

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/data.log
a1.sources.r1.shell = /bin/sh -c

2）配置Channel
memory channel
官网介绍的是：

Property Name    Default Description
type    –   The component type name, needs to be memory

对应着写自己的Channel:

a1.channels.c1.type = memory

3）配置Sink
logger sink
官网介绍的是：

Property Name    Default Description
channel    –    
type    –   The component type name, needs to be logger

对应着写自己的Sink:

a1.sinks.k1.type = logger

4) 把以上三个组件串起来

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

按照1.2.3.4这个固定的套路写任何的agent都是没有问题的

1.我们new一个文件叫做test2.conf
把我们自己的代码贴进去：

[hadoop@hadoop000 conf]$ vim test2.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data/data.log
a1.sources.r1.shell = /bin/sh -c

a1.sinks.k1.type = logger

a1.channels.c1.type = memory

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 

#:wq保存退出

2.开启我们的agent

flume-ng agent \
--name a1  \
--conf $FLUME_HOME/conf  \
--conf-file $FLUME_HOME/conf/test2.conf \
-Dflume.root.logger=INFO,console

3.开始测试数据

export 到这里，我相信你一定学会如何去写flume了。强调一下，官网是一个好的学习资源，一定不要浪费。这里，我就先简单介绍这么两个小的案例实战，后面还会继续更新更多flume的使用…一起加油

以上是关于Flume系列二之案例实战的主要内容，如果未能解决你的问题，请参考以下文章

Flume实战案例

Flume实战案例 -- 从HDFS上读取某个文件到本地目录

实战系列Flume + kafka + HDFS构建日志采集系统

Flume学习系列----实战Spooling到HDFS

互联网大数据日志收集离线实时分析实战案例

Python爬虫实战二之爬取百度贴吧帖子