Linux系统内对高CPU的监控及日志分析
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux系统内对高CPU的监控及日志分析相关的知识,希望对你有一定的参考价值。
使用linux系统时,占用cpu资源过高和,用脚本排查:
1,实时监控,一旦有cpu占用高的进程,程序启动;
2,再对进程分析,得出对应线程;
3,对对应线程所在的程序日志文档进行分析,比如Websphere中间件就有很详备的文件系统;
4,对于日志文件中error,worning等详细查看,但由于有时候日志文件过于庞大,并且容易忽略某些细节,如果用sed和awk,结合四则表达式,可以有效的定位其中的错误并不放过任何细节。
此脚本同,通过一个local脚本和一个remote脚本,能准确监控,并定位日志文件,并分析文件
本地脚本:highCpuAnalysis_l.sh:
准备工作:定义变量
###############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198
# This script is used to Analysis data for Performance, High CPU Issues on Linux‘
# Usage: ./highCpuAnalysis.sh $IP $USER
# Author: HuangTao
# Email:[email protected]
#
###############################################################################
##########################
# Define Variables #
##########################
export USER=$1;
export IP=$2;
##Usage:
if [ $# -eq 0 ] || [ $# -eq 1 ]
then
echo " Unable to find USER and IP."
echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP"
echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 "
exit 1
fi
##get the remote server‘s WAS application server name
export wasappname=$(ssh [email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘)
##get the remote server‘s hostname
export remotehostname=$(ssh [email protected]$IP hostname)
##get the current directory
export dir=$(pwd)
第1步:复制本地的分析脚本到远程主机:
###############################################################################
##Copy the script:highCpuAnalysis_r.sh to target host
echo "*********************************************************************"
echo "Step 1: "
echo "copy the highCpuAnalysis_r.sh to the remote host, and "
scp highCpuAnalysis_r.sh [email protected]$IP:/tmp/
ssh [email protected]$IP cd /tmp
ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh
echo "is RUNING on $remotehostname($IP). "
第6步:将远程主机生成的分析文档拷贝到本地主机,并删除远程主机的分析文件
###############################################################################
##run the script, make the script run on target remote host:
ssh [email protected]$IP /tmp/highCpuAnalysis_r.sh
echo "*************************************************************************"
echo "Step 6:"
echo "Copy the report and javacore to the local fenxi host:"
###############################################################################
##Copy the report and javacore to the local host then delete them:
export dir=$(pwd)
scp [email protected]$IP:/tmp/HighCpuReport* .
scp [email protected]$IP:/tmp/javacore*.gz .
tar -zxvf javacore*.gz
##Remove all related files in remate server
ssh [email protected]$IP rm -f /tmp/HighCpu*Report*
ssh [email protected]$IP rm -f /tmp/javacore*
ssh [email protected]$IP rm -f /tmp/highCpuAnaly sis_r.sh
ssh [email protected]$IP rm -f /tmp/topdashH.*
echo " "
第7步:显示分析结果
echo "*********************************************************************"
echo "step 7:"
echo "Show All information:"
echo "Remote hostname: $remotehostname($IP)."
echo "Remote Appserver name:$wasappname."
echo "Report and javacore:"
rm -f javacore*.gz
ls -rlt HighCpu*Report* |tail -1
ls -rtl javacore* |tail -3
echo "*******************************END**********************************"
远程脚本: highCpuAnalysis_r.sh
远程脚本的准备工作:定义远程主机的变量(主要有监控的时间和次数)
###############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198.
# This script is used to Analysis data for Performance, High CPU Issues on Linux‘
# Usage: ./HighCpuAnalysis.sh
# Author: HuangTao
# Email:[email protected]
###############################################################################
##########################
# Define Variables #
##########################
# How long the top dash H data should be taken in once(second).
TOP_DASH_H_VAL=30
# How many times dash H data should be taken.
TOP_DASH_H_VAL_T=3
# How long one javacores should be taken(second) .
JAVACORE_VAL=60
# How many times javacores should be taken.
JAVACORE_VAL_T=3
##get High CPU pid
此步骤得到占用cpu资源最高的进程ID
export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘)
##get turn pid number to hexadecimal (from 10 to 16)
export pid16=$(echo "obase=10; $pid" | bc)
##check the pid if WAS process
export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $4}‘)
##get the WAS application name
export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘)
##get hostname
export hostname=$(hostname)
第2步:获取占用高CPU的进程PID,并判断是否是特定程序的进程
##########################
# Get High CPU PID #
##########################
## put the report in /tmp/HighCpuReport.$pid.$hostname.out
echo "Script execude time:" $(date) > /tmp/HighCpuReport.$pid.$hostname.out
echo " "
if [ $was = wasuser ] || [ $was = wasadmin ]
then
echo "*********************************************************************"
echo "Step 2:"
echo "The Highest CPU pid is : $pid, the process is WAS porcess. " 判断是否是WAS进程
else
echo "The Highest CPU pid : $pid is NOT WAS process." | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " "
exit 1
fi
sleep 1;
echo "*********************************************************************"
在某一个特定用户名称下(比如wasadmin,httpd),找出占用CPU资源最高的前10项进程:
第3步:分析占用CPU资源最高的某线程中,在特定时间段内线程(或者是子进程)占用情况,并写入到临时文件中,待分析;
#########################
# #
# Start collection of: #
# * top dash H #
# #
#########################
# Start the collection of top dash H data.
echo "Step 3:"
echo "Starting collection of top dash H data ..."
echo "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:"
top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out
#eg: top -bH -d 30 -n 3 -p 7031
#eg: grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘
#echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ;
echo "Collected The top dash H data ."
sleep 2;
第4步:分析上步骤文件,找出某一段时间内消耗CPU资源最多的线程的前10位,(如果CPU占用资源过高是WAS造成的)则同时在指定时间段以内生成JAVACORE供分析。
###########################
# Find out the Thread of most CPU
# and TIME consumner Top 10 .
###########################
##delete the /tmp/topdashH.$pid.$hostname.out when completed the data Collection
############################
# Start collection of: #
# * javacores #
#########################
# Javacores are output to the working directory of the JVM; in most cases this is the <profile_root>
echo "*********************************************************************"
echo "Step 4:"
echo "Starting collection of Javacores ..."
echo "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:"
##clear the javacore about this PID first:
rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid*
##then generate the javacore
kill -3 $pid ;
echo "Collected the first javacore for PID $pid ."
sleep $221
kill -3 $pid ;
echo "Collected the second javacore for PID $pid ."
sleep $JAVACORE_VAL
kill -3 $pid ;
echo "Collected the third javacore for PID $pid ."
sleep $JAVACORE_VAL
##mv the javacore to the /tmp DIR and then zip:
rm -f /tmp/javacore*
mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* /tmp/
cd /tmp
tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid*
第5步:显示分析结果并将分析结果保存到临时文件中
echo "*********************************************************************"
echo "Step 5:"
echo "Print out the Analysis infomantion:"
echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*********The most CPU consumner top 10 PROCESS :*********************" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " "
echo "****The most CPU consumner top 10 *Threads* from process $pid:********" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "****The most TIME c onsumner top 10 *Threads* from process $pid:*******" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo " "
echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
以上是关于Linux系统内对高CPU的监控及日志分析的主要内容,如果未能解决你的问题,请参考以下文章
在linux中快速使用docker搭建ELK日志监控分析系统