linux分析stress模拟的性能瓶颈

Posted 2021-09-09 sysu_lluozh

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了linux分析stress模拟的性能瓶颈相关的知识，希望对你有一定的参考价值。

在【linux】系统压力模拟工具stress 中介绍了stress的使用方法，接下来使用stress模拟一些场景并分析性能的瓶颈点

一、CPU密集型进程(使用CPU的进程)

1.1 模拟使用2个CPU

模拟使用2个CPU

[root@MH-T02 ~]# stress --cpu 2 --timeout 600
stress: info: [26766] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd

1.2 使用uptime查看cpu负载

通过uptime可以观察到，系统平均负载很高

[root@MH-T02 ~]# uptime
 13:08:21 up 28 days,  4:16,  2 users,  load average: 1.57, 0.55, 0.23

1.3 使用mpstat查看cpu使用情况

通过mpstat观察到2个CPU使用率很高，平均负载也很高，而iowait为0，说明进程是CPU密集型的

[root@MH-T02 ~]# mpstat -P ALL 5 1
Linux 3.10.0-327.el7.x86_64 (localhost21) 	2021年08月19日 	_x86_64_	(8 CPU)

19时40分42秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
19时40分47秒  all   25.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   75.00
19时40分47秒    0    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
19时40分47秒    1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
19时40分47秒    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
19时40分47秒    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
19时40分47秒    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
19时40分47秒    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时40分47秒    6  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时40分47秒    7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

1.4 使用pidstat查看进程CPU使用

通过pidstat查看是哪个进程导致CPU使用率较高

[root@MH-T02 ~]# pidstat -u 5
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时21分58秒   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
20时22分03秒     0       506    0.20    0.00    0.00    0.00    0.20     3  systemd-journal
20时22分03秒     0      7769    0.00    0.20    0.00    0.00    0.20     1  kworker/1:2
20时22分03秒     0     12543    0.20    0.00    0.00    0.00    0.20     3  dstat
20时22分03秒     0     23895  100.00    0.00    0.00    0.00  100.00     2  stress
20时22分03秒     0     23896  100.00    0.00    0.00    0.00  100.00     1  stress

备注：
系统中sysstat原本的版本10.1.5，需要强制更新成11+版本才有await列数据，具体升级方式

wget -c http://pagesperso-orange.fr/sebastien.godard/sysstat-11.7.3-1.x86_64.rpm
rpm -Uvh sysstat-11.7.3-1.x86_64.rpm

通过监控信息分析可以判断很有可能是由于进程使用CPU密集导致系统平均负载变高、CPU使用率变高

二、I/O 密集型进程(等待IO的进程)

2.1 模拟IO进行压测

对IO进行压测，由于使用stress观测到的iowait指标可能为0，所以使用stress-ng

[root@MH-T02 ~]# stress-ng -i 4 --hdd 1 --timeout 600
stress-ng: info:  [28011] dispatching hogs: 1 hdd, 4 io

2.2 使用uptime查看cpu负载

通过uptime观察到，系统平均负载很高

[root@MH-T02 ~]# uptime
 19:50:55 up 28 days,  4:30,  2 users,  load average: 2.00, 1.07, 0.85

2.3 使用mpstat查看cpu使用情况

通过mpstat观察到CPU使用很低，iowait很高，一直在等待IO处理，说明此进程是IO密集型的

[root@MH-T02 ~]# mpstat -P ALL 5
Linux 3.10.0-327.el7.x86_64 (localhost21) 	2021年08月19日 	_x86_64_	(8 CPU)

19时51分22秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
19时51分27秒  all    0.03    0.00    6.39   13.05    0.00    0.00    0.00    0.00    0.00   80.54
19时51分27秒    0    0.00    0.00    0.40    0.40    0.00    0.00    0.00    0.00    0.00   99.20
19时51分27秒    1    0.00    0.00    0.20    0.00    0.00    0.00    0.00    0.00    0.00   99.80
19时51分27秒    2    0.00    0.00    3.20    7.20    0.00    0.00    0.00    0.00    0.00   89.60
19时51分27秒    3    0.20    0.00   36.40   18.40    0.00    0.00    0.00    0.00    0.00   45.00
19时51分27秒    4    0.00    0.00    2.40    7.00    0.00    0.00    0.00    0.00    0.00   90.60
19时51分27秒    5    0.00    0.00    2.59    7.19    0.00    0.00    0.00    0.00    0.00   90.22
19时51分27秒    6    0.00    0.00    2.40    7.00    0.00    0.00    0.00    0.00    0.00   90.60
19时51分27秒    7    0.00    0.00    3.82   57.75    0.00    0.00    0.00    0.00    0.00   38.43

2.4 使用pidstat查看进程CPU使用

通过pidstat查看是哪个进程导致CPU使用率较高

[root@MH-T02 ~]# pidstat -u 5
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时23分44秒   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
20时23分49秒     0        13    0.00    0.20    0.00    0.00    0.20     0  rcu_sched
20时23分49秒     0        14    0.00    0.20    0.00    0.00    0.20     2  rcuos/0
20时23分49秒     0      2377    0.20    0.00    0.00    0.00    0.20     3  basereport
20时23分49秒     0     12045    0.00    0.80    0.00    0.00    0.80     3  kworker/3:1H
20时23分49秒     0     13250    0.20    0.00    0.00    0.00    0.20     0  java
20时23分49秒     0     22837    0.00    2.00    0.00    0.00    2.00     2  kworker/u8:1
20时23分49秒     0     24173    0.20   43.00    0.00    0.00   43.20     1  stress-ng-hdd
20时23分49秒     0     24174    0.00    3.80    0.00    0.00    3.80     3  stress-ng-io
20时23分49秒     0     24175    0.00    0.20    0.00    0.00    0.20     2  stress-ng-io
20时23分49秒     0     24176    0.00    0.20    0.00    0.00    0.20     0  stress-ng-io
20时23分49秒     0     24178    0.00    1.40    0.00    0.00    1.40     0  stress-ng-io
20时23分49秒     0     24229    0.00    2.20    0.00    0.00    2.20     3  kworker/3:50
20时23分49秒     0     24234    0.20    0.20    0.00    0.00    0.40     3  pidstat

通过监控信息分析可以判断很有可能是由进程频繁的进行IO操作，导致系统平均负载很高而CPU使用率不高的情况

三、大量进程的场景(等待CPU的进程->进程间会争抢CPU)

3.1 模拟16个进程

模拟16个进程，本机是4核

[root@MH-T02 ~]# stress -c 16 --timeout 600
stress: info: [28486] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd

3.2 使用uptime查看cpu负载

通过uptime观察到系统平均负载很高

[root@MH-T02 ~]# uptime
 19:56:57 up 28 days,  4:36,  2 users,  load average: 9.38, 5.19, 2.71

3.3 使用mpstat查看cpu使用情况

通过mpstat观察到CPU使用率也很高，iowait为0，说明此进程是CPU密集型的，或者在进行CPU的争用

[root@MH-T02 ~]# mpstat -P ALL 5
Linux 3.10.0-327.el7.x86_64 (localhost21) 	2021年08月19日 	_x86_64_	(8 CPU)

19时57分20秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
19时57分25秒  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    1  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    2  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    4  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    6  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
19时57分25秒    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

3.4 使用pidstat查看进程CPU使用

通过pidstat -u观察到wait指标很高，则说明进程间存在CPU争用的情况

[root@MH-T02 ~]# pidstat -u 5
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时24分38秒   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
20时24分43秒     0     24339   24.90    0.00    0.00    0.00   24.90     1  stress
20时24分43秒     0     24340   24.90    0.00    0.00    0.00   24.90     2  stress
20时24分43秒     0     24341   24.90    0.00    0.00    0.00   24.90     1  stress
20时24分43秒     0     24342   24.90    0.00    0.00    0.00   24.90     0  stress
20时24分43秒     0     24343   24.90    0.00    0.00    0.00   24.90     1  stress
20时24分43秒     0     24344   24.90    0.00    0.00    0.00   24.90     0  stress
20时24分43秒     0     24345   24.90    0.00    0.00    0.00   24.90     3  stress
20时24分43秒     0     24346   24.90    0.00    0.00    0.00   24.90     0  stress
20时24分43秒     0     24347   24.90    0.00    0.00    0.00   24.90     3  stress
20时24分43秒     0     24348   25.10    0.00    0.00    0.00   25.10     2  stress
20时24分43秒     0     24349   24.90    0.00    0.00    0.00   24.90     3  stress
20时24分43秒     0     24350   24.70    0.00    0.00    0.00   24.70     2  stress
20时24分43秒     0     24351   24.90    0.00    0.00    0.00   24.90     1  stress
20时24分43秒     0     24352   24.90    0.00    0.00    0.00   24.90     0  stress
20时24分43秒     0     24353   24.90    0.00    0.00    0.00   24.90     3  stress
20时24分43秒     0     24354   24.90    0.00    0.00    0.00   24.90     2  stress

通过监控信息分析可以判断系统中存在大量的进程在等待使用CPU，大量的进程执行超出了CPU的计算能力的操作，导致的系统的平均负载很高

四、单进程多线程(大量线程造成上下文切换导致系统负载高)

4.1 模拟10个线程

模拟10个线程，对系统进行基准测试

[root@MH-T02 ~]# sysbench --threads=10 --time=300 threads run
sysbench 1.0.17 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 10
Initializing random number generator from current time


Initializing worker threads...

Threads started!

4.2 使用uptime查看cpu负载

[root@MH-T02 ~]# uptime
 20:30:01 up 338 days,  8:53,  4 users,  load average: 4.84, 6.56, 3.46

4.3 使用mpstat查看cpu使用情况

可以看到sys(内核态)对CPU的使用率比较高，iowait无(表示没有进程间的争用)

[root@MH-T02 ~]# mpstat -P ALL 5
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时30分24秒  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
20时30分29秒  all   22.45    0.00   72.04    0.00    0.00    0.00    0.00    0.00    0.00    5.51
20时30分29秒    0   22.64    0.00   71.70    0.00    0.00    0.00    0.00    0.00    0.00    5.66
20时30分29秒    1   22.08    0.00   72.29    0.00    0.00    0.00    0.00    0.00    0.00    5.62
20时30分29秒    2   23.44    0.00   71.37    0.00    0.00    0.00    0.00    0.00    0.00    5.19
20时30分29秒    3   21.65    0.00   72.78    0.00    0.00    0.00    0.00    0.00    0.00    5.57

4.4 使用pidstat -w查看进程CPU使用

可以看到无进程间的上下文切换(默认是进程间的)

[root@MH-T02 ~]# pidstat -w  3
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时30分48秒   UID       PID   cswch/s nvcswch/s  Command
20时30分51秒     0         1      1.00      0.00  systemd
20时30分51秒     0         3      1.33      0.00  ksoftirqd/0
20时30分51秒     0        13     13.62      0.00  rcu_sched
20时30分51秒     0        14      6.98      0.00  rcuos/0
20时30分51秒     0        15      1.99      0.00  rcuos/1
20时30分51秒     0        16      2.33      0.00  rcuos/2
20时30分51秒     0        17      1.00      0.00  rcuos/3
20时30分51秒     0        18      0.33      0.00  watchdog/0
20时30分51秒     0        19      0.33      0.00  watchdog/1
20时30分51秒     0        20      0.33      0.00  migration/1
20时30分51秒     0        21      0.66      0.00  ksoftirqd/1
20时30分51秒     0        24      0.33      0.00  watchdog/2
20时30分51秒     0        25      0.33      0.00  migration/2
20时30分51秒     0        29      0.33      0.00  watchdog/3
20时30分51秒     0        30      0.33      0.00  migration/3
20时30分51秒     0        31      0.33      0.00  ksoftirqd/3
20时30分51秒     0       435     19.93      0.00  xfsaild/dm-0
20时30分51秒     0       506      0.66      0.00  systemd-journal
20时30分51秒     0       828      0.33      0.00  irqbalance
20时30分51秒     0      1723      0.66      0.00  master
20时30分51秒     0      2974      0.33      0.00  epmd
20时30分51秒     0      9983      1.33      0.00  kworker/2:2
20时30分51秒     0     12487      1.00      0.00  sshd
20时30分51秒     0     12543      1.00      0.00  dstat
20时30分51秒     0     24229      1.00      0.00  kworker/3:50
20时30分51秒     0     24427      6.64      0.00  kworker/1:0
20时30分51秒     0     25164      1.99      0.00  kworker/0:1
20时30分51秒     0     25239      0.33      0.66  pidstat

4.5 使用pidstat查看线程CPU使用

可以看到存在大量的非自愿上下文切换(表示线程间争用引起的上下文切换，造成系统负载升高)

[root@MH-T02 ~]# pidstat -w -t 3
Linux 3.10.0-327.el7.x86_64 (MH-T02) 	2021年08月19日 	_x86_64_	(4 CPU)

20时31分28秒   UID      TGID       TID   cswch/s nvcswch/s  Command
20时31分31秒     0         1         -      0.66      0.00  systemd
20时31分31秒     0         -         1      0.66      0.00  |__systemd
20时31分31秒     0         3         -      0.66      0.00  ksoftirqd/0
20时31分31秒     0         -         3      0.66      0.00  |__ksoftirqd/0
20时31分31秒     0        13         -      8.64      0.00  rcu_sched
20时31分31秒     0         -        13      8.64      0.00  |__rcu_sched
20时31分31秒     0        14         -      3.32      0.00  rcuos/0
20时31分31秒     0         -        14      3.32      0.00  |__rcuos/0
20时31分31秒     0        15         -      1.00      0.00  rcuos/1
20时31分31秒     0         -        15      1.00      0.00  |__rcuos/1
20时31分31秒     0        16         -      1.00      0.00  rcuos/2
20时31分31秒     0         -        16      1.00      0.00  |__rcuos/2
20时31分31秒     0        17         -      1.00      0.00  rcuos/3
20时31分31秒     0         -        17      1.00      0.00  |__rcuos/3
20时31分31秒     0        18         -      0.33      0.00  watchdog/0
20时31分31秒     0         -        18      0.33      0.00  |__watchdog/0
20时31分31秒     0        19         -      0.33      0.00  watchdog/1
20时31分31秒     0         -        19      0.33      0.00  |__watchdog/1
20时31分31秒     0        21         -      0.33      0.00  ksoftirqd/1
20时31分31秒     0         -        21      0.33      0.00  |__ksoftirqd/1
20时31分31秒     0        24         -      0.33      0.00  watchdog/2
20时31分31秒     0         -        24      0.33      0.00  |__watchdog/2
20时31分31秒     0        26         -      0.33      0.00  ksoftirqd/2
20时31分31秒     0         -        26      0.33      0.00  |__ksoftirqd/2
20时31分31秒     0        29         -      0.33      0.00  watchdog/3
20时31分31秒     0         -        29      0.33      0.00  |__watchdog/3
20时31分31秒     0        31         -      0.33      0.00  ksoftirqd/3
20时31分31秒     0         -        31      0.33      0.00  |__ksoftirqd/3
20时31分31秒     0       435         -     19.93      0.00  xfsaild/dm-0
20时31分31秒     0         -       435     19.93      0.00  |__xfsaild/dm-0
20时31分31秒     0       828         -      0.33      0.00  irqbalance
20时31分31秒     0         -       828      0.33      0.00  |__irqbalance
20时31分31秒     0         -      1323      1.00      0.00  |__tuned
20时31分31秒     0         -      2298      0.33      0.00  |__agentWorker
20时31分31秒     0         -      2309      9.97      0.00  |__agentWorker
20时31分31秒     0         -      2314      9.30      0.00  |__agentWorker
20时31分31秒     0         -      2339      6.31      0.00  |__agentWorker
20时31分31秒     0         -      2379      3.99      0.00  |__basereport
20时31分31秒     0         -     24568      1.99      0.00  |__basereport
20时31分31秒     0         -     25044      1.99      0.00  |__basereport
20时31分31秒     0      2974         -      0.33      0.00  epmd
20时31分31秒     0         -      2974      0.33      0.00  |__epmd
20时31分31秒     0      9983         -      1.00      0.00  kworker/2:2
20时31分31秒     0         -      9983      1.00      0.00  |__kworker/2:2
20时31分31秒     0     12487         -      1.00      0.00  sshd
20时31分31秒     0         -     12487      1.00      0.00  |__sshd
20时31分31秒     0     12543         -      1.00      0.00  dstat
20时31分31秒     0         -     12543      1.00      0.00  |__dstat
20时31分31秒     0         -     13257      1.00      0.00  |__VM Thread
20时31分31秒     0         -     13261      0.33      0.00  |__C2 CompilerThre
20时31分31秒     0         -     13262      0.66      0.00  |__C2 CompilerThre
20时31分31秒     0         -     13263      0.33      0.00  |__C1 CompilerThre
20时31分31秒     0         -     13265     20.27      0.00  |__VM Periodic Tas
20时31分31秒     0         -     13266     99.34      0.00  |__Log4j2-TF-1-Asy
20时31分31秒     0         -     13271      0以上是关于linux分析stress模拟的性能瓶颈的主要内容，如果未能解决你的问题，请参考以下文章