Linux系统内对高CPU的监控及日志分析
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux系统内对高CPU的监控及日志分析相关的知识,希望对你有一定的参考价值。
使用linux系统时,占用cpu资源过高和,用脚本排查:
1,实时监控,一旦有cpu占用高的进程,程序启动;
2,再对进程分析,得出对应线程;
3,对对应线程所在的程序日志文档进行分析,比如Websphere中间件就有很详备的文件系统;
4,对于日志文件中error,worning等详细查看,但由于有时候日志文件过于庞大,并且容易忽略某些细节,如果用sed和awk,结合四则表达式,可以有效的定位其中的错误并不放过任何细节。
此脚本同,通过一个local脚本和一个remote脚本,能准确监控,并定位日志文件,并分析文件
highCpuAnalysis_l.sh:
############################################################################### #The source code is created in 10.19.90.165 and 192.168.86.198 # This script is used to Analysis data for Performance, High CPU Issues on Linux‘ # Usage: ./highCpuAnalysis.sh $IP $USER # Author: HuangTao # Email:[email protected]126.com # ############################################################################### ########################## # Define Variables # ########################## export USER=$1; export IP=$2; ##Usage: if [ $# -eq 0 ] || [ $# -eq 1 ] then echo " Unable to find USER and IP." echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP" echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 " exit 1 fi ##get the remote server‘s WAS application server name export wasappname=$(ssh [email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘) ##get the remote server‘s hostname export remotehostname=$(ssh [email protected]$IP hostname) ##get the current directory export dir=$(pwd) ############################################################################### ##Copy the script:highCpuAnalysis_r.sh to target host echo "*********************************************************************" echo "Step 1: " echo "copy the highCpuAnalysis_r.sh to the remote host, and " scp highCpuAnalysis_r.sh [email protected]$IP:/tmp/ ssh [email protected]$IP cd /tmp ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh echo "is RUNING on $remotehostname($IP). " ############################################################################### ##run the script, make the script run on target remote host: ssh [email protected]$IP /tmp/highCpuAnalysis_r.sh echo "*************************************************************************" echo "Step 6:" echo "Copy the report and javacore to the local fenxi host:" ############################################################################### ##Copy the report and javacore to the local host then delete them: export dir=$(pwd) scp [email protected]$IP:/tmp/HighCpuReport* . scp [email protected]$IP:/tmp/javacore*.gz . tar -zxvf javacore*.gz ##Remove all related files in remate server ssh [email protected]$IP rm -f /tmp/HighCpu*Report* ssh [email protected]$IP rm -f /tmp/javacore* ssh [email protected]$IP rm -f /tmp/highCpuAnalysis_r.sh ssh [email protected]$IP rm -f /tmp/topdashH.* echo " " echo "*********************************************************************" echo "step 7:" echo "Show All information:" echo "Remote hostname: $remotehostname($IP)." echo "Remote Appserver name:$wasappname." echo "Report and javacore:" rm -f javacore*.gz ls -rlt HighCpu*Report* |tail -1 ls -rtl javacore* |tail -3 echo "*******************************END**********************************"
highCpuAnalysis_r.sh
##aaa############################################################################# #The source code is created in 10.19.90.165 and 192.168.86.198. # This script is used to Analysis data for Performance, High CPU Issues on Linux‘ # Usage: ./HighCpuAnalysis.sh # Author: HuangTao # Email:[email protected]126.com # ############################################################################### ########################## # Define Variables # ########################## # How long the top dash H data should be taken in once(second). TOP_DASH_H_VAL=30 # How many times dash H data should be taken. TOP_DASH_H_VAL_T=3 # How long one javacores should be taken(second) . JAVACORE_VAL=60 # How many times javacores should be taken. JAVACORE_VAL_T=3 ##get High CPU pid export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘) ##get turn pid number to hexadecimal (from 10 to 16) export pid16=$(echo "obase=10; $pid" | bc) ##check the pid if WAS process export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $4}‘) ##get the WAS application name export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘) ##get hostname export hostname=$(hostname) ############################################################################### ########################## # Get High CPU PID # ########################## ## put the report in /tmp/HighCpuReport.$pid.$hostname.out echo "Script execude time:" $(date) > /tmp/HighCpuReport.$pid.$hostname.out echo " " if [ $was = wasuser ] || [ $was = wasadmin ] then echo "*********************************************************************" echo "Step 2:" echo "The Highest CPU pid is : $pid, the process is WAS porcess. " else echo "The Highest CPU pid : $pid is NOT WAS process." | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " exit 1 fi sleep 1; echo "*********************************************************************" ############################################################################### ######################### # # # Start collection of: # # * top dash H # # # ######################### # Start the collection of top dash H data. echo "Step 3:" echo "Starting collection of top dash H data ..." echo "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:" top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out #eg: top -bH -d 30 -n 3 -p 7031 #eg: grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘ #echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ; echo "Collected The top dash H data ." sleep 2; ############################################################################### ########################### # Find out the Thread of most CPU # and TIME consumner Top 10 . ########################### ##delete the /tmp/topdashH.$pid.$hostname.out when completed the data Collection ################################################################################ # Start collection of: # # * javacores # ######################### # Javacores are output to the working directory of the JVM; in most cases this is the <profile_root> echo "*********************************************************************" echo "Step 4:" echo "Starting collection of Javacores ..." echo "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:" ##clear the javacore about this PID first: rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* ##then generate the javacore kill -3 $pid ; echo "Collected the first javacore for PID $pid ." sleep $JAVACORE_VAL kill -3 $pid ; echo "Collected the second javacore for PID $pid ." sleep $JAVACORE_VAL kill -3 $pid ; echo "Collected the third javacore for PID $pid ." sleep $JAVACORE_VAL ##mv the javacore to the /tmp DIR and then zip: rm -f /tmp/javacore* mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* /tmp/ cd /tmp tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid* ################################################################################ echo "*********************************************************************" echo "Step 5:" echo "Print out the Analysis infomantion:" echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*********The most CPU consumner top 10 PROCESS :*********************" | tee -a /tmp/HighCpuReport.$pid.$hostname.out ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " echo "****The most CPU consumner top 10 *Threads* from process $pid:********" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "****The most TIME consumner top 10 *Threads* from process $pid:*******" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
至于为什么要用,2段脚本它的效果如何,希望本人能有机会当面和您沟通。
以上是关于Linux系统内对高CPU的监控及日志分析的主要内容,如果未能解决你的问题,请参考以下文章
在linux中快速使用docker搭建ELK日志监控分析系统