Flume基本操作

Posted 咱们屯里的人

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Flume基本操作相关的知识,希望对你有一定的参考价值。

Flume基本操作
 
1.把Telnet产生的内容写入到控制台显示

bin/flume-ng agent -c conf -n a1 -f conf/a1.conf -Dflume.root.logger=DEBUG,console 


a.conf内容如下:

##### define agent name ####
a1.sources = src1
a1.channels = channel1
a1.sinks = sink1
 
####  define source  ####
a1.sources.src1.type = netcat
a1.sources.src1.bind = haoguan-HP-Compaq-Pro-6380-MT
a1.sources.src1.port = 44444
 
####  define channel  ####
a1.channels.channel1.type = memory
a1.channels.channel1.capacity = 1000
a1.channels.channel1.transactionCapacity = 100
 
####  define sink  ####
a1.sinks.sink1.type = logger
a1.sinks.sink1.maxBytesToLog = 1024
 
#### bind the source and sink to the channel
a1.sources.src1.channels = channel1
a1.sinks.sink1.channel = channel1  

 

2.把hive中产生的log写入到hdfs

bin/flume-ng agent -c conf -n a2 -f conf/flume-hive.conf -Dflume.root.logger=DEBUG,console 

 
flume-hive.conf内容如下:

##### define agent name ####
a2.sources = src2
a2.channels = channel2
a2.sinks = sink2
 
####  define source  ####
a2.sources.src2.type = exec
a2.sources.src2.command = tail -f /opt/modules/cdh/hive-0.13.1-cdh5.3.6/log/hive.log
a2.sources.src2.shell = /bin/bash -c
 
####  define channel  ####
a2.channels.channel2.type = memory
a2.channels.channel2.capacity = 1000
a2.channels.channel2.transactionCapacity = 100
 
####  define sink  ####
a2.sinks.sink2.type = hdfs
a2.sinks.sink2.hdfs.path = hdfs://haoguan-HP-Compaq-Pro-6380-MT:9000/flume_hive_log/%Y%m%d  #可以指定时间戳作为分区目录
a2.sinks.sink2.hdfs.filePrefix = events-
a2.sinks.sink2.hdfs.fileType = DataStream
a2.sinks.sink2.hdfs.writeFormat = Text
a2.sinks.sink2.hdfs.batchSize = 10
a2.sinks.sink2.hdfs.rollInterval = 30      #设置flush间隔,30秒flush一次,无论有没到达rollSize大小
a2.sinks.sink2.hdfs.rollSize = 10240       #设置文件大小(byte),到指定大小flush一次,无论有没到达rollInterval间隔
a2.sinks.sink2.hdfs.rollCount = 0          #rollCount必须设置成0,不然会影响rollInterval,rollSize的设置
a2.sinks.sink2.hdfs.idleTimeout=0
a2.sinks.sink2.hdfs.useLocalTimeStamp = true    #使用时间戳作为分区必须设置useLocalTimeStamp为true
 
#### bind the source and sink to the channel
a2.sources.src2.channels = channel2
a2.sinks.sink2.channel = channel2  

 
如果是HA架构需要把HA的core-site.xml与hdfs-site.xml放入到/opt/modules/cdh/flume-1.5.0-cdh5.3.6/conf中
a2.sinks.sink2.hdfs.path = hdfs://haoguan-HP-Compaq-Pro-6380-MT:9000/flume_hive_log
换成
a2.sinks.sink2.hdfs.path = hdfs://ns1/flume_hive_log
 
 
3. spooldir方式抽取文件到hdfs中

bin/flume-ng agent -c conf -n a3 -f conf/flume-app.conf -Dflume.root.logger=DEBUG,console 

 
flume-app.conf内容如下:

##### define agent name ####
a3.sources = src3
a3.channels = channel3
a3.sinks = sink3
 
####  define source  ####
a3.sources.src3.type = spooldir
a3.sources.src3.spoolDir = /opt/modules/cdh/flume-1.5.0-cdh5.3.6/spoollogs #指定被抽取的文件夹
a3.sources.src3.ignorePattern = ^.*\.log$  #过滤被抽取文件夹中指定的文件
a3.sources.src3.fileSuffix = _COMP         #文件抽取完成以后更改后缀
 
####  define channel  ####
a3.channels.channel3.type = file
a3.channels.channel3.checkpointDir = /opt/modules/cdh/flume-1.5.0-cdh5.3.6/filechannel/checkpoint
a3.channels.channel3.dataDirs = /opt/modules/cdh/flume-1.5.0-cdh5.3.6/filechannel/data
a3.channels.channel3.capacity = 1000
a3.channels.channel3.transactionCapacity = 100
 
####  define sink  ####
a3.sinks.sink3.type = hdfs
a3.sinks.sink3.hdfs.path = hdfs://haoguan-HP-Compaq-Pro-6380-MT:9000/flume_app_log
a3.sinks.sink3.hdfs.filePrefix = events-
a3.sinks.sink3.hdfs.fileType = DataStream
a3.sinks.sink3.hdfs.writeFormat = Text
a3.sinks.sink3.hdfs.batchSize = 10
a3.sinks.sink3.hdfs.rollInterval = 30
a3.sinks.sink3.hdfs.rollSize = 10240
a3.sinks.sink3.hdfs.rollCount = 0
a3.sinks.sink3.hdfs.idleTimeout=0
#a3.sinks.sink3.hdfs.useLocalTimeStamp = true
 
#### bind the source and sink to the channel
a3.sources.src3.channels = channel3
a3.sinks.sink3.channel = channel3  

 



以上是关于Flume基本操作的主要内容,如果未能解决你的问题,请参考以下文章

Flume 源码解析:组件生命周期

flume 监控目录操作

Flume的介绍和简单操作

Flume基本操作

flume 基本知识

Flume -- 初识flumesource和sink