Linux系统内对高CPU的监控及日志分析

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux系统内对高CPU的监控及日志分析相关的知识,希望对你有一定的参考价值。

使用linux系统时,占用cpu资源过高和,用脚本排查:

1,实时监控,一旦有cpu占用高的进程,程序启动;

2,再对进程分析,得出对应线程;

3,对对应线程所在的程序日志文档进行分析,比如Websphere中间件就有很详备的文件系统;

4,对于日志文件中error,worning等详细查看,但由于有时候日志文件过于庞大,并且容易忽略某些细节,如果用sed和awk,结合四则表达式,可以有效的定位其中的错误并不放过任何细节。

 

此脚本同,通过一个local脚本和一个remote脚本,能准确监控,并定位日志文件,并分析文件

highCpuAnalysis_l.sh:

###############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198
# This script is used to Analysis data for Performance, High CPU Issues on Linux# Usage:    ./highCpuAnalysis.sh $IP $USER
# Author: HuangTao
# Email:[email protected]126.com
# 
###############################################################################
##########################
#  Define Variables      #
##########################
export USER=$1;
export IP=$2;

##Usage:
if [ $# -eq 0 ] || [ $# -eq 1 ]
then
echo " Unable to find  USER and IP."
echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP"
echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 "
exit 1
fi

##get the remote servers WAS application server name
export wasappname=$(ssh [email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $NF})

##get the remote servers hostname
export remotehostname=$(ssh [email protected]$IP hostname)

##get the current directory
export dir=$(pwd)

###############################################################################
##Copy the script:highCpuAnalysis_r.sh to target host
echo "*********************************************************************" 
echo "Step 1: "
echo "copy the highCpuAnalysis_r.sh to the remote host, and "
scp  highCpuAnalysis_r.sh  [email protected]$IP:/tmp/
ssh [email protected]$IP cd /tmp
ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh
echo "is RUNING on $remotehostname($IP). "


###############################################################################
##run the script, make the script run on target remote host:
ssh [email protected]$IP  /tmp/highCpuAnalysis_r.sh
echo "*************************************************************************"
echo "Step 6:"
echo "Copy the report and javacore to the local fenxi host:"
###############################################################################
##Copy the report and javacore to the local host then delete them:
export dir=$(pwd)
scp [email protected]$IP:/tmp/HighCpuReport* .
scp [email protected]$IP:/tmp/javacore*.gz  .
tar -zxvf javacore*.gz

##Remove all related files in remate server
ssh [email protected]$IP rm -f /tmp/HighCpu*Report*
ssh [email protected]$IP rm -f /tmp/javacore*
ssh [email protected]$IP rm -f /tmp/highCpuAnalysis_r.sh
ssh [email protected]$IP rm -f /tmp/topdashH.*
echo "   "
echo "*********************************************************************" 
echo "step 7:"
echo "Show All information:"
echo "Remote hostname: $remotehostname($IP)."
echo "Remote Appserver name:$wasappname."
echo "Report and javacore:"

rm -f javacore*.gz
ls -rlt HighCpu*Report* |tail -1
ls -rtl javacore* |tail -3

echo  "*******************************END**********************************"
 

 

highCpuAnalysis_r.sh

##aaa#############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198.
# This script is used to Analysis data for Performance, High CPU Issues on Linux# Usage:    ./HighCpuAnalysis.sh 
# Author: HuangTao
# Email:[email protected]126.com
# 
###############################################################################
##########################
#  Define Variables      #
########################## 
# How long the top dash H data should be taken in once(second). 
TOP_DASH_H_VAL=30  
# How many times dash H data should be taken. 
TOP_DASH_H_VAL_T=3

# How long one javacores should be taken(second) .    
JAVACORE_VAL=60 
# How many times javacores should be taken.   
JAVACORE_VAL_T=3 


##get High CPU pid
export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n 2p |awk {print $3})

##get turn pid number to hexadecimal (from 10 to 16)
export pid16=$(echo "obase=10; $pid" | bc)

##check the pid if WAS process
export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $4})

##get the WAS application name
export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $NF})

##get hostname
export hostname=$(hostname)

###############################################################################
##########################
# Get High CPU PID       #
########################## 
## put the report in /tmp/HighCpuReport.$pid.$hostname.out
echo "Script execude time:" $(date)  > /tmp/HighCpuReport.$pid.$hostname.out
echo "   " 
if [ $was = wasuser ]  || [ $was = wasadmin ] 
then 
echo "*********************************************************************"                                                                       
echo "Step 2:"
echo "The Highest CPU pid is :  $pid, the process is WAS porcess. "               
else
echo "The Highest CPU pid :  $pid is NOT WAS process."  | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   " 
exit 1
fi
sleep 1;
echo "*********************************************************************" 
###############################################################################
#########################
#                       #
# Start collection of:  #
#  * top dash H         #
#                       #
#########################
# Start the collection of top dash H data.
echo  "Step 3:" 
echo  "Starting collection of top dash H data ..." 
echo  "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:"
      top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out 
    #eg:   top -bH -d 30 -n 3 -p 7031
    #eg:  grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n 2p |awk {print $3}
#echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ;
echo  "Collected The top dash H data ." 
sleep 2;
###############################################################################
###########################
#  Find out the Thread of  most CPU  
#  and TIME consumner  Top 10 .                         
###########################


##delete the /tmp/topdashH.$pid.$hostname.out   when completed the data Collection
 
################################################################################
# Start collection of:  #
#  * javacores          #
#########################
# Javacores are output to the working directory of the JVM; in most cases this is the <profile_root>
echo "*********************************************************************"    
echo  "Step 4:" 
echo  "Starting collection of Javacores ..." 
echo  "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:"
##clear the javacore about this PID first:
rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* 
##then generate the javacore
        kill -3 $pid ;
        echo "Collected the first javacore for PID $pid ."   
        sleep $JAVACORE_VAL
        
        kill -3 $pid ;
        echo "Collected the second javacore for PID $pid ." 
        sleep $JAVACORE_VAL

        kill -3 $pid ;
        echo "Collected the third javacore for PID $pid ."     
        sleep $JAVACORE_VAL    
        
##mv the javacore to the /tmp DIR and then zip:
rm -f /tmp/javacore*
mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid*     /tmp/
cd /tmp
tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid*
################################################################################

echo "*********************************************************************"  
echo  "Step 5:" 
echo  "Print out the Analysis infomantion:" 
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*********The most CPU consumner top 10 PROCESS :*********************"   | tee -a /tmp/HighCpuReport.$pid.$hostname.out
ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10                          | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "
 
echo "****The most CPU consumner top 10 *Threads* from process $pid:********"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9  -n -r  -k1 -u |head -10             | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out

echo "****The most TIME consumner top 10 *Threads* from process $pid:*******"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11  -n  -r -k1 -u  |head -10         | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "

echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
 

至于为什么要用,2段脚本它的效果如何,希望本人能有机会当面和您沟通。

 

以上是关于Linux系统内对高CPU的监控及日志分析的主要内容,如果未能解决你的问题,请参考以下文章

shell系统监控及日志分析

Linux监控分析

在linux中快速使用docker搭建ELK日志监控分析系统

shell宝典linux系统程序监控安全日志整理实战

运维 | 智能运维(AiOps)之日志监控及日志分析系统分析

如何用java代码来监控系统内存·cpu·线程占用情况,并生成日志