Analyse Linux kernel by using Perf
Posted X-Solomon-X
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Analyse Linux kernel by using Perf相关的知识,希望对你有一定的参考价值。
Perf简介:Perf是Linux内核自带的系统性能调优工具,支持CPU PMU事件和软件事件计数,可用于找到程序的热点代码,以及找到程序的性能瓶颈。
Perf参考文档:
perf Static Tracepointsperf Static Tracepointshttps://www.brendangregg.com/blog/2014-06-29/perf-static-tracepoints.html
- perf list:枚举perf支持的所有的事件,可以在各种perf命令中通过-e选项选择对应的事件。
[root@VM-centos ~]# perf list
List of pre-defined events (to be used in -e):
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
msr/tsc/ [Kernel PMU event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
(see 'man perf-list' on how to encode it)
mem:<addr>[/len][:access] [Hardware breakpoint]
alarmtimer:alarmtimer_cancel [Tracepoint event]
alarmtimer:alarmtimer_fired [Tracepoint event]
alarmtimer:alarmtimer_start [Tracepoint event]
alarmtimer:alarmtimer_suspend [Tracepoint event]
binder:binder_alloc_lru_end [Tracepoint event]
binder:binder_alloc_lru_start [Tracepoint event]
...
#统计所有Tracepoint的类别
[root@VM-centos ~]# perf list | awk -F: '/Tracepoint event/ lib[$1]++ END
> for (l in lib) printf " %-16.16s %d\\n", l, lib[l] ' | sort | column
alarmtimer 4 fib 1 iscsi 7 nmi 1 rpm 4 tlb 1
binder 31 fib6 1 jbd2 16 nvme 4 rseq 2 udp 1
block 18 filelock 12 kmem 12 oom 8 rtc 12 ufs 13
bpf_test_run 1 filemap 4 kyber 3 page_isolation 1 sched 24 vmscan 18
bridge 4 fs_dax 14 libata 6 pagemap 2 scsi 5 vsyscall 1
cgroup 13 ftrace 2 mce 1 page_pool 3 signal 2 wbt 4
clk 16 huge_memory 4 mdio 1 percpu 5 skb 3 workqueue 4
compaction 14 hwmon 3 migrate 1 power 23 sock 3 writeback 35
cpuhp 3 initcall 3 module 5 printk 1 sunrpc 52 x86_fpu 11
damon 2 intel_iommu 6 mpx 5 qdisc 1 swiotlb 1 xdp 12
devfreq 1 iocost 5 msr 3 random 15 syscalls 664 xhci-hcd 51
devlink 5 iommu 7 napi 1 ras 6 task 2
dma_fence 7 irq 5 nbd 5 raw_syscalls 2 tcp 7
exceptions 2 irq_matrix 12 neigh 7 rcu 1 thermal 3
ext4 105 irq_vectors 34 net 18 resctrl 3 timer 13
- perf list 'keyword':以'keyword'字符串匹配所有perf事件。
[root@VM-centos ~]# perf list 'sched:*'
List of pre-defined events (to be used in -e):
sched:sched_kthread_stop [Tracepoint event]
sched:sched_kthread_stop_ret [Tracepoint event]
sched:sched_migrate_task [Tracepoint event]
sched:sched_move_numa [Tracepoint event]
sched:sched_pi_setprio [Tracepoint event]
sched:sched_process_exec [Tracepoint event]
sched:sched_process_exit [Tracepoint event]
sched:sched_process_fork [Tracepoint event]
sched:sched_process_free [Tracepoint event]
sched:sched_process_hang [Tracepoint event]
sched:sched_process_wait [Tracepoint event]
sched:sched_stat_blocked [Tracepoint event]
sched:sched_stat_iowait [Tracepoint event]
sched:sched_stat_runtime [Tracepoint event]
sched:sched_stat_sleep [Tracepoint event]
sched:sched_stat_wait [Tracepoint event]
sched:sched_stick_numa [Tracepoint event]
sched:sched_swap_numa [Tracepoint event]
sched:sched_switch [Tracepoint event]
sched:sched_wait_task [Tracepoint event]
sched:sched_wake_idle_without_ipi [Tracepoint event]
sched:sched_wakeup [Tracepoint event]
sched:sched_wakeup_new [Tracepoint event]
sched:sched_waking [Tracepoint event]
- perf stat [-e <event>] <command>:执行一个命令,统计这个命令在执行过程中事件的发生情况。
#执行sleep命令,并且进行事件统计
[root@VM-centos ~]# perf stat sleep 5
Performance counter stats for 'sleep 5':
0.47 msec task-clock # 0.000 CPUs utilized
1 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
74 page-faults # 0.156 M/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
5.000771757 seconds time elapsed
0.000724000 seconds user
0.000000000 seconds sys
#执行sleep命令,对系统所有CPU事件进行统计
[root@VM-centos ~]# perf stat -a sleep 5
Performance counter stats for 'system wide':
40,004.92 msec cpu-clock # 8.000 CPUs utilized
6,325 context-switches # 0.158 K/sec
211 cpu-migrations # 0.005 K/sec
20,939 page-faults # 0.523 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
5.000791182 seconds time elapsed
#执行sleep命令,对系统所有CPU事件进行统计,并且每隔1000ms打印统计信息。
[root@VM-centos ~]# perf stat -a -I 1000 sleep 5
# time counts unit events
1.000251373 8,002.05 msec cpu-clock # 8.002 CPUs utilized
1.000251373 950 context-switches # 0.119 K/sec
1.000251373 12 cpu-migrations # 0.001 K/sec
1.000251373 1,239 page-faults # 0.155 K/sec
1.000251373 <not supported> cycles
1.000251373 <not supported> instructions
1.000251373 <not supported> branches
1.000251373 <not supported> branch-misses
2.000639272 8,003.10 msec cpu-clock # 8.003 CPUs utilized
2.000639272 906 context-switches # 0.113 K/sec
2.000639272 6 cpu-migrations # 0.001 K/sec
2.000639272 10 page-faults # 0.001 K/sec
2.000639272 <not supported> cycles
2.000639272 <not supported> instructions
2.000639272 <not supported> branches
2.000639272 <not supported> branch-misses
3.000999689 8,002.96 msec cpu-clock # 8.003 CPUs utilized
3.000999689 917 context-switches # 0.115 K/sec
3.000999689 7 cpu-migrations # 0.001 K/sec
3.000999689 125 page-faults # 0.016 K/sec
3.000999689 <not supported> cycles
3.000999689 <not supported> instructions
3.000999689 <not supported> branches
3.000999689 <not supported> branch-misses
4.001390263 8,003.21 msec cpu-clock # 8.003 CPUs utilized
4.001390263 932 context-switches # 0.116 K/sec
4.001390263 4 cpu-migrations # 0.000 K/sec
4.001390263 1 page-faults # 0.000 K/sec
4.001390263 <not supported> cycles
4.001390263 <not supported> instructions
4.001390263 <not supported> branches
4.001390263 <not supported> branch-misses
5.000611536 7,993.74 msec cpu-clock # 7.994 CPUs utilized
5.000611536 990 context-switches # 0.124 K/sec
5.000611536 7 cpu-migrations # 0.001 K/sec
5.000611536 1,176 page-faults # 0.147 K/sec
5.000611536 <not supported> cycles
5.000611536 <not supported> instructions
5.000611536 <not supported> branches
5.000611536 <not supported> branch-misses
#如果不使用-e进行指定监控的事件,perf默认只监控perf list中前8个事件,可以使用通配符来统计一类事件。
[root@VM-centos ~]# perf stat -e 'sched:*' sleep 5
Performance counter stats for 'sleep 5':
0 sched:sched_kthread_stop
0 sched:sched_kthread_stop_ret
0 sched:sched_waking
0 sched:sched_wakeup
0 sched:sched_wakeup_new
1 sched:sched_switch
0 sched:sched_migrate_task
0 sched:sched_process_free
1 sched:sched_process_exit
0 sched:sched_wait_task
0 sched:sched_process_wait
0 sched:sched_process_fork
1 sched:sched_process_exec
0 sched:sched_stat_wait
0 sched:sched_stat_sleep
0 sched:sched_stat_iowait
0 sched:sched_stat_blocked
459,952 sched:sched_stat_runtime
0 sched:sched_pi_setprio
0 sched:sched_process_hang
0 sched:sched_move_numa
0 sched:sched_stick_numa
0 sched:sched_swap_numa
0 sched:sched_wake_idle_without_ipi
5.000703957 seconds time elapsed
0.000638000 seconds user
0.000000000 seconds sys
#可以使用-p选项来指定监控的进程PID
[root@VM-centos ~]# perf stat -e 'sched:*' -p 105605
^C
Performance counter stats for process id '105605':
<not counted> sched:sched_kthread_stop
<not counted> sched:sched_kthread_stop_ret
<not counted> sched:sched_waking
<not counted> sched:sched_wakeup
<not counted> sched:sched_wakeup_new
<not counted> sched:sched_switch
<not counted> sched:sched_migrate_task
<not counted> sched:sched_process_free
<not counted> sched:sched_process_exit
<not counted> sched:sched_wait_task
<not counted> sched:sched_process_wait
<not counted> sched:sched_process_fork
<not counted> sched:sched_process_exec
<not counted> sched:sched_stat_wait
<not counted> sched:sched_stat_sleep
<not counted> sched:sched_stat_iowait
<not counted> sched:sched_stat_blocked
<not counted> sched:sched_stat_runtime
<not counted> sched:sched_pi_setprio
<not counted> sched:sched_process_hang
<not counted> sched:sched_move_numa
<not counted> sched:sched_stick_numa
<not counted> sched:sched_swap_numa
<not counted> sched:sched_wake_idle_without_ipi
19.698014307 seconds time elapsed
- perf record [-e <event>] <command>:执行一个命令,收集这个命令在运行时更全面的信息,并且保存在perf.data文件中。
#-c perf采样的间隔,每次发生minor-faults事件都要进行采样。
[root@VM-centos ~]# perf record -e minor-faults -c 1 sleep 10
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 2.191 MB perf.data (25261 samples) ]
#-a 选项指定采样系统中所有的 CPU。
#-g 选项指定记录下用户进程或者内核的调用栈。其中,--call-graph dwarf 指定调用栈收集的方式为 dwarf,即 libdwarf 和 libdunwind 的方式。Perf 还支持 fp 和 lbs 方式。
#-F 选项指定 perf 以 997 次每秒的频率对 CPU 上运行的用户进程或者内核上下文进行采样 (Sampling)。由于 Linux 内核的时钟中断是以 1000 次每秒的频率周期触发,所以按照 997 频率采样可以避免每次采样都采样到始终中断相关的处理,减少干扰。
#sleep 60 则是通过 perf 指定运行的命令,这个命令起到了让 perf 运行 60 秒然后退出的效果。
[root@VM-centos ~]# perf record -a -g --call-graph dwarf -F 997 sleep 10
[ perf record: Woken up 39 times to write data ]
[ perf record: Captured and wrote 10.978 MB perf.data (63514 samples) ]
[root@VM-centos ~]# ls
perf.data
- perf script:使用perf script可以将perf.data中所有的采样点信息进行展示。
[root@VM-centos ~]# perf script
swapper 0 [000] 32190.034251: 1003009 cpu-clock:pppH:
ffffffff81c86235 default_idle+0x35 (/lib/modules/.../build/vmlinux)
ffffffff8102b59f arch_cpu_idle+0xf (/lib/modules/.../build/vmlinux)
ffffffff81c867a6 default_idle_call+0x26 (/lib/.../build/vmlinux)
#-F选项可以指定展示的列
[root@VM-centos ~]# perf script -F comm,pid
perf 120910
perf 120910
perf 120910
perf 120911
perf 120910
perf 120910
perf 120910
perf 120911
perf 120911
- perf report:使用perf record可以将perf.data中的信息更结构化的展示。
#用stdio对采样结果以调用栈的形式呈现
[root@VM-centos ~]# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 63K of event 'cpu-clock:pppH'
# Event count (approx.): 63705113626
#
# Children Self Command Shared Object Symbol
# ........ ........ .......... .......................... .................................................................................
#
99.84% 0.00% swapper [kernel.vmlinux] [k] 0xffffffff810000e6
|
---0xffffffff810000e6
|
|--84.16%--start_secondary
| cpu_startup_entry
| do_idle
| |
| --84.14%--default_idle_call
| arch_cpu_idle
| default_idle
|
--15.68%--x86_64_start_kernel
x86_64_start_reservations
start_kernel
arch_call_rest_init
rest_init
cpu_startup_entry
do_idle
default_idle_call
arch_cpu_idle
default_idle
99.84% 0.00% swapper [kernel.vmlinux] [k] cpu_startup_entry
|
---cpu_startup_entry
do_idle
|
--99.82%--default_idle_call
arch_cpu_idle
default_idle
#查看header里面的captured on时间,应该表示结束时间,time of last sample最后采集时间戳,单位是秒,可往前追溯现场时间
[root@VM-centos ~]# perf report --header-only
...
# CACHE info available, use -I to display
# time of first sample : 7898.887141
# time of last sample : 7908.898817
...
#根据时间戳索引
[root@VM-centos ~]# perf report --time start_tsc,end_tsc
...
- 祭出终极法宝——火焰图
[root@VM-centos ~]# git clone https://github.com/brendangregg/FlameGraph
Cloning into 'FlameGraph'...
remote: Enumerating objects: 1147, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1147 (delta 13), reused 13 (delta 5), pack-reused 1119
Receiving objects: 100% (1147/1147), 1.90 MiB | 3.84 MiB/s, done.
Resolving deltas: 100% (659/659), done.
[root@VM-centos ~]# ls
perf.data
[root@VM-centos ~]# cp perf.data ./FlameGraph/
[root@VM-centos ~]# cd ./FlameGraph/
[root@VM-centos FlameGraph]# ls
aix-perf.pl example-perf.svg record-test.sh stackcollapse-java-exceptions.pl stackcollapse-sample.awk
demos files.pl stackcollapse-aix.pl stackcollapse-jstack.pl stackcollapse-stap.pl
dev flamegraph.pl stackcollapse-bpftrace.pl stackcollapse-ljp.awk stackcollapse-vsprof.pl
difffolded.pl jmaps stackcollapse-chrome-tracing.py stackcollapse-perf.pl stackcollapse-vtune.pl
docs perf.data stackcollapse-elfutils.pl stackcollapse-perf-sched.awk stackcollapse-wcp.pl
example-dtrace-stacks.txt pkgsplit-perf.pl stackcollapse-gdb.pl stackcollapse.pl stackcollapse-xdebug.php
example-dtrace.svg range-perf.pl stackcollapse-go.pl stackcollapse-pmc.pl test
example-perf-stacks.txt.gz README.md stackcollapse-instruments.pl stackcollapse-recursive.pl test.sh
[root@VM-centos FlameGraph]# perf script | ./stackcollapse-perf.pl > out.perf-folded
[root@VM-centos FlameGraph]# cat out.perf-folded | ./flamegraph.pl > perf-kernel.svg
#用浏览器打开perf-kernel.svg矢量图。
以上是关于Analyse Linux kernel by using Perf的主要内容,如果未能解决你的问题,请参考以下文章
Analyse Linux kernel by using Perf
Analyse Linux kernel by using Perf
Build U-Boot and Linux Kernel from Source Code
正点原子I.MX6U-MINI移植篇kernel移植过程详解