Linux Ftrace介绍与原理
Posted rtoax
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux Ftrace介绍与原理相关的知识,希望对你有一定的参考价值。
文档修改日志
日期 | 修改内容 | 修改人 | 备注 |
---|---|---|---|
2021年10月19日 | 创建 | 荣涛 |
1. debugfs
默认情况下,debugfs挂载在:
/sys/kernel/debug
当然,你可以重新挂在:
mkdir /debug
sudo mount -t debugfs nodev /debug
2. tracing
2.1. /sys/kernel/debug/
# ls /sys/kernel/debug/
acpi clk error_injection intel_powerclamp pkg_temp_thermal sleep_time usb
asoc device_component extfrag iwlwifi pmc_core soundwire wakeup_sources
bdi devices_deferred fault_around_bytes kprobes pm_genpd split_huge_pages x86
block dma_buf frontswap kvm pwm sunrpc zsmalloc
bluetooth dmaengine gpio mce ras suspend_stats zswap
cec dma_pools hid mei0 regmap swiotlb
cleancache dri ieee80211 mmc0 sched_debug thunderbolt
clear_warn_once dynamic_debug intel_lpss pinctrl sched_features tracing
2.2. available_tracers
# cat /sys/kernel/debug/tracing/available_tracers
hwlat blk function_graph wakeup_dl wakeup_rt wakeup function nop
2.2.1. function|function_graph
开启函数追踪
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace
关闭函数追踪
echo nop > /sys/kernel/debug/tracing/current_tracer
2.2.2. irqsoff
echo irqsoff > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace
我的环境不支持
2.3. tracing_on
关闭函数追踪
echo 0 > /sys/kernel/debug/tracing/tracing_on
2.4. set_ftrace_filter
cat set_ftrace_filter
#### all functions enabled ####
echo schedule > /sys/kernel/debug/tracing/set_ftrace_filter
echo function > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace
echo "" > /sys/kernel/debug/tracing/set_ftrace_filter
echo nop > /sys/kernel/debug/tracing/current_tracer
追加
echo schedule_tail >> /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/set_ftrace_filter
可以使用正则表达式:
echo 'sched*' > /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/set_ftrace_filter
2.5. set_ftrace_notrace
清空:
echo > /sys/kernel/debug/tracing/set_ftrace_filter
cat /sys/kernel/debug/tracing/set_ftrace_notrace
#### no functions disabled ####
剔除:
echo '*lock*' > /sys/kernel/debug/tracing/set_ftrace_notrace
3. events
# ls /sys/kernel/debug/tracing/events
alarmtimer devlink hda iomap mdio oom regmap sunrpc wbt
asoc dma_fence hda_controller iommu mei page_isolation resctrl swiotlb workqueue
block drm hda_intel irq migrate pagemap rpm syscalls writeback
bpf_test_run enable header_event irq_matrix mmc page_pool rseq task x86_fpu
bpf_trace exceptions header_page irq_vectors module percpu rtc tcp xdp
bridge fib huge_memory kmem msr power sched thermal xen
cfg80211 fib6 hyperv kvm napi printk scsi timer xfs
cgroup filelock i2c kvmmmu neigh qdisc signal tlb xhci-hcd
clk filemap i915 kyber net random skb ucsi
compaction fs_dax initcall libata netlink ras smbus udp
context_tracking ftrace intel_iommu mac80211 nmi raw_syscalls sock vmscan
cpuhp gvt intel-sst mce nvme rcu spi vsyscall
events/sched
# ls /sys/kernel/debug/tracing/events/sched/
enable sched_pi_setprio sched_process_wait sched_stick_numa sched_wakeup_new
filter sched_process_exec sched_stat_blocked sched_swap_numa sched_waking
sched_kthread_stop sched_process_exit sched_stat_iowait sched_switch
sched_kthread_stop_ret sched_process_fork sched_stat_runtime sched_wait_task
sched_migrate_task sched_process_free sched_stat_sleep sched_wake_idle_without_ipi
sched_move_numa sched_process_hang sched_stat_wait sched_wakeup
events/sched/sched_wakeup
# ls /sys/kernel/debug/tracing/events/sched/sched_wakeup
enable filter format hist id trigger
3.1. 使能sched_wakeup event
echo nop > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
cat trace
3.2. 使能全部sched events
echo 1 > /sys/kernel/debug/tracing/events/sched/enable
3.3. 使能全部events
echo 1 > /sys/kernel/debug/tracing/events/enable
4. 参考
- Slice: Ftrace: Latency Tracing, Steven Rostedt
Copyright (C) CESTC Com.
文档修改日志
日期 | 修改内容 | 修改人 | 备注 |
---|---|---|---|
2021年10月19日 | 新建 | 荣涛 |
1. ftrace原理
asmlinkage __visible void __sched schedule(void)
{
struct task_struct *tsk = current;
sched_submit_work(tsk);
__schedule();
}
反汇编:
<schedule>:
55 push %rbp
48 8b 04 25 80 c0 0e mov 0xffffffff810ec080,%rax
81
48 89 e5 mov %rsp,%rbp
48 8b 00 mov (%rax),%rax
5d pop %rbp
e9 db fa ff ff jmpq ffffffff810bb100 <__schedule>
66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
00 00 00 00
添加-pg
选项的反汇编
<schedule>:
55 push %rbp
48 89 e5 mov %rsp,%rbp
e8 37 2e 00 00 callq ffffffff810f7430 <mcount>
5d pop %rbp
48 8b 04 25 80 d0 15 mov 0xffffffff8115d080,%rax
81
48 8b 00 mov (%rax),%rax
e9 96 fa ff ff jmpq ffffffff810f40a0 <__schedule>
66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
简化schedule
函数
<schedule>:
push %rbp
mov %rsp,%rbp
callq <mcount>
pop %rbp
调用mcount
<mcount>:
retq
1.1. 内核中mcount实现
以内核为例,内核里有源码和脚本scripts/recordmcount
。整体步骤如下:
- 查找所有调用mcount的位置
- 创建table
- 链接table到目标文件
- 新的section为__mcount_loc
<schedule>:
push %rbp
mov %rsp,%rbp
callq <mcount>
pop %rbp
[…]
<preempt_schedule_irq>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
<_cond_resched>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
<yield>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
__mcount_loc
表:
<__mcount_loc>:
&schedule + 0x4
&preempt_schedule_irq + 0x4
&_cond_resched + 0x4
&yield + 0x4
并把__mcount_loc
表加入到对应的目标文件中:
在kernel/sched/core.o
:
<__mcount_loc>:
&schedule + 0x4
&preempt_schedule_irq + 0x4
&_cond_resched + 0x4
&yield + 0x4
在mm/swap.o
:
<__mcount_loc>:
&put_page + 0x4
&__get_page_tail + 0x4
&put_pages_list + 0x4
&get_kernel_pages + 0x4
在fs/read_write.o
:
<__mcount_loc>:
&new_sync_read + 0x4
&vfs_setpos + 0x4
&fixed_size_llseek + 0x4
&default_llseek + 0x4
在编译vmlinux过程,如下:
然后将所有__mcount_loc
表加入vmLinux
中:
<__start_mcount_loc>:
&schedule + 0x4
&preempt_schedule_irq + 0x4
&_cond_resched + 0x4
&yield + 0x4
&put_page + 0x4
&__get_page_tail + 0x4
&put_pages_list + 0x4
&get_kernel_pages + 0x4
&new_sync_read + 0x4
&vfs_setpos + 0x4
&fixed_size_llseek + 0x4
&default_llseek + 0x4
[...]
<___end_mcount_loc>:
而mcount中实际上都是地址:
<__start_mcount_loc>:
0xffffffff810f45f4
0xffffffff810f4635
0xffffffff810f4684
0xffffffff810f4734
0xffffffff81087ad4
0xffffffff81087b14
0xffffffff81087bd5
0xffffffff81087c41
0xffffffff810a7aa0
0xffffffff810a7bd4
0xffffffff810a7d34
0xffffffff810a7d7d
[...]
<___end_mcount_loc>:
最终的vmlinux为:
<schedule>:
push %rbp
mov %rsp,%rbp
callq <mcount>
pop %rbp
[…]
<preempt_schedule_irq>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
<_cond_resched>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
<yield>:
push %rbp
mov %rsp,%rbp
push %rbx
callq <mcount>
pop %rbp
[…]
<__start_mcount_loc>:
[...]
<___end_mcount_loc>:
1.2. tracing实现
- 需要一种方法,可以使能tracing
- 抛弃mcount section
- mcount section不够用
- tracing也需要保存状态
<ftrace_pages>
如下
ip = 0xffffffff81087ad4
flags = 0
ip = 0xffffffff81087b14
flags = 0
ip = 0xffffffff81087bd5
flags = 0
ip = 0xffffffff81087c41
flags = 0
ip = 0xffffffff810a7aa0
flags = 0
ip = 0xffffffff810a7bd4
flags = 0
ip = 0xffffffff810a7d34
flags = 0
ip = 0xffffffff810a7d7d
flags = 0
ip = 0xffffffff810f45f4
flags = 0
ip = 0xffffffff810f4635
flags = 0
ip = 0xffffffff810f4684
flags = 0
ip = 0xffffffff810f4734
flags = 0
[…]
上面的选项对应:
# cat available_filter_functions
put_page
__get_page_tail
put_pages_list
get_kernel_pages
new_sync_read
vfs_setpos
fixed_size_llseek
default_llseek
schedule
preempt_schedule_irq
_cond_resched
yield
开启tracing 过滤filter
# echo yield > set_ftrace_filter
# echo schedule >> set_ftrace_filter
# cat set_ftrace_filter
schedule
yield
- First 29 bits are for counter
- Every registered callback increments +1
- bit 29 (starts from zero) – ENABLED
- bit 30 – REGS
- bit 31 – REGS_EN
那么ftrace_pages
flags被修改为:
ip = 0xffffffff81087ad4
flags = 0
ip = 0xffffffff81087b14
flags = 0
ip = 0xffffffff81087bd5
flags = 0
ip = 0xffffffff81087c41
flags = 0
ip = 0xffffffff810a7aa0
flags = 0
ip = 0xffffffff810a7bd4
flags = 0
ip = 0xffffffff810a7d34
flags = 0
ip = 0xffffffff810a7d7d
flags = 0
ip = 0xffffffff810f45f4
flags = 0x20000001
ip = 0xffffffff810f4635
flags = 0
ip = 0xffffffff810f4684
flags = 0
ip = 0xffffffff810f4734
flags = 0xa0000001
[…]
那么,vmlinux
被修改为:
<schedule>:
push %rbp
mov %rsp,%rbp
call ftrace_caller
pop %rbp
[…]
<preempt_schedule_irq>:
push %rbp
mov %rsp,%rbp
push %rbx
nop
pop %rbp
[…]
<_cond_resched>:
push %rbp
mov %rsp,%rbp
push %rbx
nop
pop %rbp
[…]
<yield>:
push %rbp
mov %rsp,%rbp
push %rbx
call ftrace_regs_caller
pop %rbp
[…]
如下图:
2. 参考
- Ftrace Kernel Hooks: More than just tracing, Steven Rostedt
Copyright (C) CESTC Com.
以上是关于Linux Ftrace介绍与原理的主要内容,如果未能解决你的问题,请参考以下文章