脚本管理日志流式采集处理保存过程

Posted Mr.zhou_Zxy

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了脚本管理日志流式采集处理保存过程相关的知识,希望对你有一定的参考价值。

使用脚本管理日志采集过程

实现将数据通过hudi,导入到hdfs中,并创建hive表来存储数据

[hdfs,yarn,hive,openresty,frp,zookeeper,kafka,log2hudi]

1.流程介绍:

首先通过frp穿透,将数据数据穿透到openresty
再从openresty,通过lua脚本,将数据从生产端生产到Kafka中
再从Kafka通过代码将数据通过hudi和Struct Streaming保存到hdfs中
并在hive中建立表,方便后续分析

2.Real_time_data_warehouse.sh

#!/bin/bash
# filename:Real_time_data_warehouse.sh
# autho:zxy
# date:2021-07-19

# 使用脚本实现开启hdfs和yarn进程,hive服务,启动frp穿透,开启openresty采集数据,开启Kafka的服务,执行log2hudi脚本
# 1.start-dfs.sh 
#	start-yarn.sh 
#	stop-yarn.sh
#	stop-dfs.sh
# 2.sh /opt/apps/scripts/hiveScripts/start-supervisord-hive.sh start
#   sh /opt/apps/scripts/hiveScripts/start-supervisord-hive.sh stop
# 3.sh /opt/apps/scripts/frpScripts/frp-agent.sh start
#   sh /opt/apps/scripts/frpScripts/frp-agent.sh stop
# 3.openresty -p /opt/apps/collect-app/
#   openresty -p /opt/apps/collect-app/ -s stop
# 4.sh /opt/apps/scripts/kafkaScripts/start-kafka.sh zookeeper
#   sh /opt/apps/scripts/kafkaScripts/start-kafka.sh kafka
#   sh /opt/apps/scripts/kafkaScripts/start-kafka.sh stop
# 5.sh /opt/apps/scripts/log2hudi.sh start
#   sh /opt/apps/scripts/log2hudi.sh stop

#JDK环境
JAVA_HOME=/data/apps/jdk1.8.0_261
#Hadoop环境
HADOOP_HOME=/data/apps/hadoop-2.8.1
#Hive环境
HIVE_HOME=/data/apps/hive-1.2.1
#Kafka环境
KAFKA_HOME=/data/apps/kafka_2.11-2.4.1
#Frp穿透环境
FRP_HOME=/data/apps/frp
#Openresty环境
COLLECT_HOME=/data/apps/collect-app
#脚本目录
SCRIPTS=/opt/apps/scripts
#接收参数
CMD=$1
DONE=$2


## 帮助函数
usage() {
    echo "usage:"
    echo "Real_time_data_warehouse.sh hdfs start/stop"
	echo "Real_time_data_warehouse.sh yarn start/stop"
	echo "Real_time_data_warehouse.sh hive start/stop"
	echo "Real_time_data_warehouse.sh frp start/stop"
	echo "Real_time_data_warehouse.sh openresty start/stop"
	echo "Real_time_data_warehouse.sh zookeeper start/stop"
	echo "Real_time_data_warehouse.sh kafka start/stop"
	echo "Real_time_data_warehouse.sh log2hudi start/stop"
    echo "description:"
    echo "      hdfs:start/stop hdfsService"
	echo "      yarn:start/stop yarnService"
	echo "      hive:start/stop hiveService"
	echo "      frp:start/stop frpService"
	echo "      openresty:start/stop openrestyService"
	echo "      zookeeper:start/stop zookeeperService"
	echo "      kafka:start/stop kafkaService"
	echo "      log2hudi:start/stop log2hudiService"
    exit 0
}

# 1.管理hdfs服务
if [ ${CMD} == "hdfs" ] && [ ${DONE} == "start" ];then
        # 启动hdfs
		start-dfs.sh
		echo " hdfs启动成功 "
elif [ ${CMD} == "hdfs" ] && [ ${DONE} == "stop" ];then
		# 关闭hdfs服务
		stop-dfs.sh
		echo " hdfs关闭成功 "
# 2.管理yarn服务
elif [ ${CMD} == "yarn" ] && [ ${DONE} == "start" ];then
        # 启动yarn
		start-yarn.sh
		echo " yarn启动成功 "
elif [ ${CMD} == "yarn" ] && [ ${DONE} == "stop" ];then
		# 关闭yarn服务
		stop-yarn.sh
		echo " yarn关闭成功 "
# 3.管理hive服务
elif [ ${CMD} == "hive" ] && [ ${DONE} == "start" ];then
        # 启动hive
		sh ${SCRIPTS}/hiveScripts/start-supervisord-hive.sh start
		echo " hive启动成功 "
elif [ ${CMD} == "hive" ] && [ ${DONE} == "stop" ];then
		# 关闭hive服务
		sh ${SCRIPTS}/hiveScripts/start-supervisord-hive.sh stop
		echo " hive关闭成功 "
# 4.管理frp服务
elif [ ${CMD} == "frp" ] && [ ${DONE} == "start" ];then
        # 启动frp
		sh ${SCRIPTS}/frpScripts/frp-agent.sh start
		echo " frp启动成功 "
elif [ ${CMD} == "frp" ] && [ ${DONE} == "stop" ];then
		# 关闭frp服务
		sh ${SCRIPTS}/frpScripts/frp-agent.sh start	
		echo " frp关闭成功 "	
# 4.管理openresty服务
elif [ ${CMD} == "openresty" ] && [ ${DONE} == "start" ];then
        # 启动openresty
		openresty -p ${COLLECT_HOME}
		echo " openresty启动成功 "
elif [ ${CMD} == "openresty" ] && [ ${DONE} == "stop" ];then
		# 关闭openresty服务
		openresty -p ${COLLECT_HOME} -s stop
		echo " openresty关闭成功 "
# 5.管理kafka服务
elif [ ${CMD} == "zookeeper" ] && [ ${DONE} == "start" ];then
        # 启动zookeeper
		sh ${SCRIPTS}/kafkaScripts/start-kafka.sh zookeeper
		echo " zookeeper启动成功 "
elif [ ${CMD} == "zookeeper" ] && [ ${DONE} == "stop" ];then
		# 关闭zookeeper服务
		sh ${SCRIPTS}/kafkaScripts/start-kafka.sh stopz	
		echo " zookeeper关闭成功 "	
elif [ ${CMD} == "kafka" ] && [ ${DONE} == "start" ];then
	    # 开启kafka服务
		sh ${SCRIPTS}/kafkaScripts/start-kafka.sh kafka	
		echo " Kafka启动成功 "
elif [ ${CMD} == "kafka" ] && [ ${DONE} == "stop" ];then
	    # 关闭kafka服务
		sh ${SCRIPTS}/kafkaScripts/start-kafka.sh stopk
		echo " Kafka关闭成功 "
# 6.管理log2hudi服务
elif [ ${CMD} == "log2hudi" ] && [ ${DONE} == "start" ];then
        # 启动log2hudi
		sh ${SCRIPTS}/log2hudi.sh start
		echo " log2hudi启动成功 "
elif [ ${CMD} == "log2hudi" ] && [ ${DONE} == "stop" ];then
		# 关闭log2hudi服务
		sh ${SCRIPTS}/log2hudi.sh stop	
		echo " log2hudi关闭成功 "
else
        usage
fi

3.启动测试运行

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh hdfs start
Starting namenodes on [hadoop]
hadoop: starting namenode, logging to /data/apps/hadoop-2.8.1/logs/hadoop-root-namenode-hadoop.out
localhost: starting datanode, logging to /data/apps/hadoop-2.8.1/logs/hadoop-root-datanode-hadoop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /data/apps/hadoop-2.8.1/logs/hadoop-root-secondarynamenode-hadoop.out
 hdfs启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh yarn start
starting yarn daemons
starting resourcemanager, logging to /data/apps/hadoop-2.8.1/logs/yarn-root-resourcemanager-hadoop.out
localhost: starting nodemanager, logging to /data/apps/hadoop-2.8.1/logs/yarn-root-nodemanager-hadoop.out
 yarn启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh hive start
● supervisord.service - Process Monitoring and Control Daemon
   Loaded: loaded (/usr/lib/systemd/system/supervisord.service; disabled; vendor preset: disabled)
   Active: active (running) since 一 2021-07-19 20:24:02 CST; 9ms ago
  Process: 88087 ExecStart=/usr/bin/supervisord -c /etc/supervisord.conf (code=exited, status=0/SUCCESS)
 Main PID: 88090 (supervisord)
   CGroup: /system.slice/supervisord.service
           └─88090 /usr/bin/python /usr/bin/supervisord -c /etc/supervisord.conf

719 20:24:02 hadoop systemd[1]: Starting Process Monitoring and Control Daemon...
719 20:24:02 hadoop systemd[1]: Started Process Monitoring and Control Daemon.
 hive启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh openresty start
 openresty启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh frp start
root      80039  0.0  0.0 113184  1460 pts/0    S+   20:11   0:00 sh Real_time_data_warehouse.sh frp start
root      80040  0.0  0.0 113184  1432 pts/0    S+   20:11   0:00 sh /opt/apps/scripts/frpScripts/frp-agent.sh start
root      80041  0.0  0.0  11644     4 pts/0    D+   20:11   0:00 [frpc]
root      80043  0.0  0.0 112728   952 pts/0    S+   20:11   0:00 grep frp
 frp启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh zookeeper start
 zookeeper启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh kafka start
 Kafka启动成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh log2hudi start
starting -------------->
 log2hudi启动成功

[root@hadoop MyScripts]# yarn application --list
21/07/19 20:13:57 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.130.111:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                Application-Id      Application-Name        Application-Type          User           Queue                   State             Final-State       Progress                         Tracking-URL
application_1626695833179_0001              log2hudi                   SPARK          root       root.root                 RUNNING               UNDEFINED            10%                  http://hadoop:32795
  • yarn进程
    在这里插入图片描述
  • 日志成功保存到hdfs文件

在这里插入图片描述

4.关闭测试运行

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh log2hudi stop
21/07/19 20:17:50 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.130.111:8032
21/07/19 20:17:52 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.130.111:8032
Killing application application_1626695833179_0001
21/07/19 20:17:53 INFO impl.YarnClientImpl: Killed application application_1626695833179_0001
stop ok
 log2hudi关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh kafka stop
kafka关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh zookeeper stop
No zookeeper server to stop
 zookeeper关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh openresty stop
 openresty关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh frp stop
root      80041  0.3  0.1 713160 12480 pts/0    Sl   20:11   0:01 frpc http --sd zxy -l 8802 -s frp.qfbigdata.com:7001 -u zxy
root      84665  0.0  0.0 113184  1460 pts/0    S+   20:18   0:00 sh Real_time_data_warehouse.sh frp stop
root      84666  0.0  0.0 113184  1432 pts/0    S+   20:18   0:00 sh /opt/apps/scripts/frpScripts/frp-agent.sh start
root      84667  0.0  0.0 714312  8124 pts/0    Sl+  20:18   0:00 frpc http --sd zxy -l 8802 -s frp.qfbigdata.com:7001 -u zxy
root      84669  0.0  0.0 112728   952 pts/0    S+   20:18   0:00 grep frp
 frp关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh hive  stop
● supervisord.service - Process Monitoring and Control Daemon
   Loaded: loaded (/usr/lib/systemd/system/supervisord.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

719 17:33:23 hadoop systemd[1]: Stopping Process Monitoring and Control Daemon...
719 17:33:34 hadoop systemd[1]: Stopped Process Monitoring and Control Daemon.
719 17:33:43 hadoop systemd[1]: Starting Process Monitoring and Control Daemon...
719 17:33:43 hadoop systemd[1]: Started Process Monitoring and Control Daemon.
719 19:46:17 hadoop systemd[1]: Stopping Process Monitoring and Control Daemon...
719 19:46:27 hadoop systemd[1]: Stopped Process Monitoring and Control Daemon.
719 19:57:25 hadoop systemd[1]: Starting Process Monitoring and Control Daemon...
719 19:57:25 hadoop systemd[1]: Started Process Monitoring and Control Daemon.
719 20:19:03 hadoop systemd[1]: Stopping Process Monitoring and Control Daemon...
719 20:19:14 hadoop systemd[1]: Stopped Process Monitoring and Control Daemon.
 hive关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh yarn stop
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
 yarn关闭成功

[root@hadoop MyScripts]# sh Real_time_data_warehouse.sh hdfs stop
Stopping namenodes on [hadoop]
hadoop: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
 hdfs关闭成功

[root@hadoop MyScripts]# jps
85651 Jps

5.帮助文档

# 开启项目
sh Real_time_data_warehouse.sh hdfs start

sh Real_time_data_warehouse.sh yarn start

sh Real_time_data_warehouse.sh hive start

sh Real_time_data_warehouse.sh openresty start

sh Real_time_data_warehouse.sh frp start

sh Real_time_data_warehouse.sh zookeeper start

sh Real_time_data_warehouse.sh kafka start

sh Real_time_data_warehouse.sh log2hudi start

# 关闭项目
sh Real_time_data_warehouse.sh log2hudi stop

sh Real_time_data_warehouse.sh kafka stop

sh Real_time_data_warehouse.sh zookeeper stop

sh Real_time_data_warehouse.sh openresty stop

sh Real_time_data_warehouse.sh frp stop

sh Real_time_data_warehouse.sh hive  stop

sh Real_time_data_warehouse.sh yarn stop

sh Real_time_data_warehouse.sh hdfs stop

6.涉及脚本

以上是关于脚本管理日志流式采集处理保存过程的主要内容,如果未能解决你的问题,请参考以下文章

大数据离线处理数据项目 网站日志文件数据采集 日志拆分 数据采集到HDFS并进行预处理

Flume

京东「卖家日志」系统的构建 | 流式计算日志系统应用实践

ELK 日志采集框架:Logstash安装与配置

专业的力量 海量日志监控平台(Spark Streaming,实时流式处理系统)

Windows 批处理脚本指南: 日志