一个未完成的2.6.32-220内核踩内存crash分析记录
Posted 安庆
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了一个未完成的2.6.32-220内核踩内存crash分析记录相关的知识,希望对你有一定的参考价值。
遇到一个crash,log如下:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81166504>] s_show+0xe4/0x330 PGD 1158954067 PUD 12666d8067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host0/port-0:0/expander-0:0/port-0:0:6/end_device-0:0:6/target0:0:6/0:0:6:0/block/sdw/stat CPU 11 Modules linked in: ********************** Pid: 7739, comm: slabtop Not tainted 2.6.32-220.el6.x86_64 #1 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. RIP: 0010:[<ffffffff81166504>] [<ffffffff81166504>] s_show+0xe4/0x330 RSP: 0018:ffff8817fc9e1d98 EFLAGS: 00010086 RAX: ffff880c2fc217c0 RBX: 00000000000003fb RCX: ffff880c2fc21800 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c2fc217d0 RBP: ffff8817fc9e1e18 R08: 0000000000000001 R09: 0000000000000001 R10: ffffffff817a234e R11: 0000000000000246 R12: 00000000000003fb R13: ffffffff817a234e R14: 0000000000000400 R15: 0000000000000000 FS: 00007feb7bb6b700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000125178a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process slabtop (pid: 7739, threadinfo ffff8817fc9e0000, task ffff8817facd2100) Stack: ffff8812666d8080 ffff881158954000 ffff881700000000 ffff880c2fc21800 <0> ffff880c2fc217c0 ffff8817f3eee740 ffff880c2fd18498 0000000000000000 <0> 0000000000000000 ffff880c2fd10440 ffff8817fc9e1e18 ffff8817f3eee740 Call Trace: [<ffffffff811a0a35>] seq_read+0xe5/0x3f0 [<ffffffff811e35be>] proc_reg_read+0x7e/0xc0 [<ffffffff8117ea75>] vfs_read+0xb5/0x1a0 [<ffffffff810d68c2>] ? audit_syscall_entry+0xc2/0x2b0 [<ffffffff8117ebb1>] sys_read+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Code: 10 48 39 fe 74 34 4c 8b 45 c8 45 8b 88 18 80 00 00 45 89 c8 0f 1f 00 4d 85 ed 75 0f 44 39 4e 20 49 c7 c2 4e 23 7a 81 4d 0f 45 ea <48> 8b 36 4d 01 c4 48 83 c3 01 48 39 fe 75 dd 48 8b 30 48 39 f0 RIP [<ffffffff81166504>] s_show+0xe4/0x330 RSP <ffff8817fc9e1d98> CR2: 0000000000000000
堆栈如下:
crash> bt PID: 7739 TASK: ffff8817facd2100 CPU: 11 COMMAND: "slabtop" bt: invalid kernel virtual address: 776f645f7570635f type: "cpu_online_map" #0 [ffff8817fc9e1960] machine_kexec at ffffffff8103244b #1 [ffff8817fc9e19c0] crash_kexec at ffffffff810baf92 #2 [ffff8817fc9e1a90] oops_end at ffffffff814fded0 #3 [ffff8817fc9e1ac0] no_context at ffffffff810425db #4 [ffff8817fc9e1b10] __bad_area_nosemaphore at ffffffff81042865 #5 [ffff8817fc9e1b60] bad_area at ffffffff8104298e #6 [ffff8817fc9e1b90] __do_page_fault at ffffffff810430c0 #7 [ffff8817fc9e1cb0] do_page_fault at ffffffff814ffefe #8 [ffff8817fc9e1ce0] page_fault at ffffffff814fd255 [exception RIP: s_show+228] RIP: ffffffff81166504 RSP: ffff8817fc9e1d98 RFLAGS: 00010086 RAX: ffff880c2fc217c0 RBX: 00000000000003fb RCX: ffff880c2fc21800 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c2fc217d0 RBP: ffff8817fc9e1e18 R8: 0000000000000001 R9: 0000000000000001 R10: ffffffff817a234e R11: 0000000000000246 R12: 00000000000003fb R13: ffffffff817a234e R14: 0000000000000400 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff8817fc9e1e20] seq_read at ffffffff811a0a35 #10 [ffff8817fc9e1ea0] proc_reg_read at ffffffff811e35be #11 [ffff8817fc9e1ef0] vfs_read at ffffffff8117ea75 #12 [ffff8817fc9e1f30] sys_read at ffffffff8117ebb1 #13 [ffff8817fc9e1f80] system_call_fastpath at ffffffff8100b0f2 RIP: 000000370d0d83f0 RSP: 00007fff183a9450 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffffff8100b0f2 RCX: 0000000002160040 RDX: 0000000000000400 RSI: 00007feb7bb8a000 RDI: 0000000000000003 RBP: 000000000000079b R8: 74616462616c7320 R9: 3020202020202061 R10: 2030202020202020 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000a R14: 000000000215a010 R15: 000000000000000a ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b
函数端在s_show:
crash> dis -l s_show dis: s_show: duplicate text symbols found: ffffffff81023b70 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/arch/x86/kernel/cpu/mcheck/mce-severity.c: 162 ffffffff810b2d30 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/kernel/kallsyms.c: 461 ffffffff810f1800 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/kernel/trace/trace.c: 1984 ffffffff8114e360 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/vmalloc.c: 2452 ffffffff81166420 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
发现很多个s_show的定义,所以反汇编下出错的地址:
[exception RIP: s_show+228]
RIP: ffffffff81166504
crash> dis -l ffffffff81166504 /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258 0xffffffff81166504 <s_show+228>: mov (%rsi),%rsi
根据代码行,找到的函数是slab.c中的s_show,可以很明显根据堆栈看到最后回溯的rsi是空指针,所以会出现访问空指针的oops。
下面需要分析,rsi为啥是空指针。
crash> dis -l ffffffff81166420 /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236 0xffffffff81166420 <s_show>: push %rbp 0xffffffff81166421 <s_show+1>: mov %rsp,%rbp 0xffffffff81166424 <s_show+4>: push %r15 0xffffffff81166426 <s_show+6>: push %r14 0xffffffff81166428 <s_show+8>: push %r13 0xffffffff8116642a <s_show+10>: push %r12 0xffffffff8116642c <s_show+12>: push %rbx 0xffffffff8116642d <s_show+13>: sub $0x58,%rsp 0xffffffff81166431 <s_show+17>: nopl 0x0(%rax,%rax,1) /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237 0xffffffff81166436 <s_show+22>: mov %rsi,%rax /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236 0xffffffff81166439 <s_show+25>: mov %rdi,-0x58(%rbp) 0xffffffff8116643d <s_show+29>: mov %rsi,-0x50(%rbp) /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237 0xffffffff81166441 <s_show+33>: sub $0x8058,%rax--------------------找到对应的4237行 /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239 0xffffffff81166447 <s_show+39>: mov $0x200,%esi 0xffffffff8116644c <s_show+44>: mov $0xffffffff81c05280,%rdi /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237 0xffffffff81166453 <s_show+51>: mov %rax,-0x38(%rbp) /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239 0xffffffff81166457 <s_show+55>: callq 0xffffffff81275a10 <find_first_bit> 0xffffffff8116645c <s_show+60>: cmp $0x200,%eax 0xffffffff81166461 <s_show+65>: mov %eax,%edx 0xffffffff81166463 <s_show+67>: mov $0x200,%eax 0xffffffff81166468 <s_show+72>: cmovg %eax,%edx /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4250 0xffffffff8116646b <s_show+75>: cmp $0x1ff,%edx 0xffffffff81166471 <s_show+81>: jg 0xffffffff81166730 <s_show+784> 0xffffffff81166477 <s_show+87>: xor %r13d,%r13d 0xffffffff8116647a <s_show+90>: movq $0x0,-0x48(%rbp) 0xffffffff81166482 <s_show+98>: movq $0x0,-0x40(%rbp) 0xffffffff8116648a <s_show+106>: xor %r15d,%r15d 0xffffffff8116648d <s_show+109>: xor %ebx,%ebx 0xffffffff8116648f <s_show+111>: xor %r12d,%r12d 0xffffffff81166492 <s_show+114>: nopw 0x0(%rax,%rax,1) /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4251 0xffffffff81166498 <s_show+120>: mov -0x38(%rbp),%rcx 0xffffffff8116649c <s_show+124>: movslq %edx,%rax 0xffffffff8116649f <s_show+127>: mov 0x8068(%rcx,%rax,8),%rax /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4252 0xffffffff811664a7 <s_show+135>: test %rax,%rax 0xffffffff811664aa <s_show+138>: je 0xffffffff811666f8 <s_show+728> /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4256 0xffffffff811664b0 <s_show+144>: lea 0x40(%rax),%rcx 0xffffffff811664b4 <s_show+148>: mov %rax,-0x60(%rbp) 0xffffffff811664b8 <s_show+152>: mov %edx,-0x70(%rbp) 0xffffffff811664bb <s_show+155>: mov %rcx,%rdi 0xffffffff811664be <s_show+158>: mov %rcx,-0x68(%rbp) 0xffffffff811664c2 <s_show+162>: callq 0xffffffff814fcc50 <_spin_lock_irq> /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258 0xffffffff811664c7 <s_show+167>: mov -0x60(%rbp),%rax 0xffffffff811664cb <s_show+171>: mov -0x70(%rbp),%edx 0xffffffff811664ce <s_show+174>: mov -0x68(%rbp),%rcx 0xffffffff811664d2 <s_show+178>: mov 0x10(%rax),%rsi 0xffffffff811664d6 <s_show+182>: lea 0x10(%rax),%rdi 0xffffffff811664da <s_show+186>: cmp %rdi,%rsi 0xffffffff811664dd <s_show+189>: je 0xffffffff81166513 <s_show+243> 0xffffffff811664df <s_show+191>: mov -0x38(%rbp),%r8 0xffffffff811664e3 <s_show+195>: mov 0x8018(%r8),%r9d 0xffffffff811664ea <s_show+202>: mov %r9d,%r8d 0xffffffff811664ed <s_show+205>: nopl (%rax) /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4259 0xffffffff811664f0 <s_show+208>: test %r13,%r13 0xffffffff811664f3 <s_show+211>: jne 0xffffffff81166504 <s_show+228> 0xffffffff811664f5 <s_show+213>: cmp %r9d,0x20(%rsi) 0xffffffff811664f9 <s_show+217>: mov $0xffffffff817a234e,%r10 0xffffffff81166500 <s_show+224>: cmovne %r10,%r13 /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258 0xffffffff81166504 <s_show+228>: mov (%rsi),%rsi
根据代码行号4258行,可以确定 在访问 slabs_full链表时出错异常:
4235 static int s_show(struct seq_file *m, void *p) 4236 { 4237 struct kmem_cache *cachep = list_entry(p, struct kmem_cache, next); 4238 struct slab *slabp; 4239 unsigned long active_objs; 4240 unsigned long num_objs; 4241 unsigned long active_slabs = 0; 4242 unsigned long num_slabs, free_objects = 0, shared_avail = 0; 4243 const char *name; 4244 char *error = NULL; 4245 int node; 4246 struct kmem_list3 *l3; 4247 4248 active_objs = 0; 4249 num_slabs = 0; 4250 for_each_online_node(node) { 4251 l3 = cachep->nodelists[node]; 4252 if (!l3) 4253 continue; 4254 4255 check_irq_on(); 4256 spin_lock_irq(&l3->list_lock); 4257 4258 list_for_each_entry(slabp, &l3->slabs_full, list) {
要想获取slabp,就得解析l3,要想解析l3,则需要解析cachep,要解析cachep,则需要解析传入的void*p,根据堆栈void*p是 seq_read中传入的。我们来看看这个*p到底是个什么参数:
根据反汇编代码,p就是一个头指针,它嵌入在kmem_cache中,
crash> struct -xo kmem_cache struct kmem_cache { [0x0] struct array_cache *array[4096]; [0x8000] unsigned int batchcount; [0x8004] unsigned int limit; [0x8008] unsigned int shared; [0x800c] unsigned int buffer_size; [0x8010] u32 reciprocal_buffer_size; [0x8014] unsigned int flags; [0x8018] unsigned int num; [0x801c] unsigned int gfporder; [0x8020] gfp_t gfpflags; [0x8028] size_t colour; [0x8030] unsigned int colour_off; [0x8038] struct kmem_cache *slabp_cache; [0x8040] unsigned int slab_size; [0x8044] unsigned int dflags; [0x8048] void (*ctor)(void *); [0x8050] const char *name; [0x8058] struct list_head next;------------嵌入
对函数反汇编:
crash> dis -l 0xffffffff81166420
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
0xffffffff81166420 <s_show>: push %rbp
0xffffffff81166421 <s_show+1>: mov %rsp,%rbp
0xffffffff81166424 <s_show+4>: push %r15
0xffffffff81166426 <s_show+6>: push %r14
0xffffffff81166428 <s_show+8>: push %r13
0xffffffff8116642a <s_show+10>: push %r12
0xffffffff8116642c <s_show+12>: push %rbx
0xffffffff8116642d <s_show+13>: sub $0x58,%rsp
0xffffffff81166431 <s_show+17>: nopl 0x0(%rax,%rax,1)
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
0xffffffff81166436 <s_show+22>: mov %rsi,%rax-------------------------------------rsi赋值给了rax,rsi中存放的是s_show函数的第二个参数*p
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
0xffffffff81166439 <s_show+25>: mov %rdi,-0x58(%rbp)
0xffffffff8116643d <s_show+29>: mov %rsi,-0x50(%rbp)------------------------------rsi刚好又压栈了,所以根据rbp可以取出s_show的第二个参数*p
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
0xffffffff81166441 <s_show+33>: sub $0x8058,%rax
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239
0xffffffff81166447 <s_show+39>: mov $0x200,%esi
0xffffffff8116644c <s_show+44>: mov $0xffffffff81c05280,%rdi
/usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
0xffffffff81166453 <s_show+51>: mov %rax,-0x38(%rbp)
查找rbp堆栈,然后-0x50,就可以获取到*p;
#8 [ffff8817fc9e1ce0] page_fault at ffffffff814fd255
[exception RIP: s_show+228]
RIP: ffffffff81166504 RSP: ffff8817fc9e1d98 RFLAGS: 00010086
RAX: ffff880c2fc217c0 RBX: 00000000000003fb RCX: ffff880c2fc21800
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c2fc217d0
RBP: ffff8817fc9e1e18 R8: 0000000000000001 R9: 0000000000000001
R10: ffffffff817a234e R11: 0000000000000246 R12: 00000000000003fb
R13: ffffffff817a234e R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
ffff8817fc9e1ce8: 0000000000000000 0000000000000400
ffff8817fc9e1cf8: ffffffff817a234e 00000000000003fb
ffff8817fc9e1d08: ffff8817fc9e1e18 00000000000003fb
ffff8817fc9e1d18: 0000000000000246 ffffffff817a234e
ffff8817fc9e1d28: 0000000000000001 0000000000000001
ffff8817fc9e1d38: ffff880c2fc217c0 ffff880c2fc21800
ffff8817fc9e1d48: 0000000000000000 0000000000000000
ffff8817fc9e1d58: ffff880c2fc217d0 ffffffffffffffff
ffff8817fc9e1d68: ffffffff81166504 0000000000000010
ffff8817fc9e1d78: 0000000000010086 ffff8817fc9e1d98
ffff8817fc9e1d88: 0000000000000018 ffffffff811664c7
ffff8817fc9e1d98: ffff8812666d8080 ffff881158954000
ffff8817fc9e1da8: ffff881700000000 ffff880c2fc21800
ffff8817fc9e1db8: ffff880c2fc217c0 ffff8817f3eee740
ffff8817fc9e1dc8: ffff880c2fd18498 0000000000000000
ffff8817fc9e1dd8: 0000000000000000 ffff880c2fd10440
ffff8817fc9e1de8: ffff8817fc9e1e18 ffff8817f3eee740
ffff8817fc9e1df8: ffff880c9ddbba80 ffff880c2fd18498
ffff8817fc9e1e08: 0000000000000400 ffff8817fc9e1e60
ffff8817fc9e1e18: ffff8817fc9e1e98 ffffffff811a0a35
#9 [ffff8817fc9e1e20] seq_read at ffffffff811a0a35
crash> struct -xo kmem_cache
struct kmem_cache {
[0x0] struct array_cache *array[4096];
[0x8000] unsigned int batchcount;
[0x8004] unsigned int limit;
[0x8008] unsigned int shared;
[0x800c] unsigned int buffer_size;
[0x8010] u32 reciprocal_buffer_size;
[0x8014] unsigned int flags;
[0x8018] unsigned int num;
[0x801c] unsigned int gfporder;
[0x8020] gfp_t gfpflags;
[0x8028] size_t colour;
[0x8030] unsigned int colour_off;
[0x8038] struct kmem_cache *slabp_cache;
[0x8040] unsigned int slab_size;
[0x8044] unsigned int dflags;
[0x8048] void (*ctor)(void *);
[0x8050] const char *name;
[0x8058] struct list_head next;
[0x8068] struct kmem_list3 *nodelists[512];
}
SIZE: 0x9068
crash> px 0xffff880c2fd18498-0x8058
$4 = 0xffff880c2fd10440
crash> struct kmem_cache 0xffff880c2fd10440
struct kmem_cache {
array = {0xffff880c2fe93180, 0xffff880c1199cec0, 0xffff880c1199c6c0, 0xffff880c11a4cd80, 0xffff881811e1f580, 0xffff881811e1fd80, 0xffff881811e796c0, 0xffff881811e79ec0, 0xffff880c11a4c580, 0xffff880c11ae6cc0, 0xffff880c11ae64c0, 0xffff880c11b2bb80, 0xffff881811eee780, 0xffff881811f3a0c0, 0xffff881811f3a8c0, 0xffff881811f7d180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
batchcount = 12,
limit = 24,
shared = 8,
buffer_size = 4096,
reciprocal_buffer_size = 1048576,
flags = 2147753984,
num = 1,
gfporder = 0,
gfpflags = 0,
colour = 0,
colour_off = 64,
slabp_cache = 0xffff880c2fc40100,---------------slab的管理数据和slab的obj分离,
slab_size = 52,
dflags = 0,
ctor = 0x0,
name = 0xffffffff817a24d8 "size-4096",
next = {
next = 0xffff880c2fd08458,
prev = 0xffff880c2fd284d8
},
nodelists = {0xffff880c2fc217c0, 0xffff88182fc007c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}
}
根据找到的p值,我们确定了在遍历地址为0xffff880c2fd10440 的kmem_cache结构的nodelists的slabs_full 链表时,访问了空指针。
不过,因为nodelists是一个数组,我们现需要确定访问哪个下标时出错了。还好因为系统只有两个node,所以干脆从下标0遍历一下看:
crash> struct -xo kmem_list3 struct kmem_list3 { [0x0] struct list_head slabs_partial; [0x10] struct list_head slabs_full; [0x20] struct list_head slabs_free; [0x30] unsigned long free_objects; [0x38] unsigned int free_limit; [0x3c] unsigned int colour_next; [0x40] spinlock_t list_lock; [0x48] struct array_cache *shared; [0x50] struct array_cache **alien; [0x58] unsigned long next_reap; [0x60] int free_touched; } SIZE: 0x68
crash> rd 0xffff880c2fc217c0 40 ffff880c2fc217c0: ffff880c2fc217c0 ffff880c2fc217c0 .../......./.... ffff880c2fc217d0: ffff88049dd9ea80 ffff880c2fc23380 .........3./.... ffff880c2fc217e0: ffff880c2fc217e0 ffff880c2fc217e0 .../......./.... ffff880c2fc217f0: 0000000000000000 0000000000000061 ........a....... ffff880c2fc21800: 0000000006d606d5 ffff880c2fe78800 .........../.... ffff880c2fc21810: ffff880c2fc203e0 00000001030105e7 .../............ ffff880c2fc21820: 0000000000000000 0000000000000000 ................ ffff880c2fc21830: 0000000000000000 0000000000000000 ................ ffff880c2fc21840: ffff880c2fc21840 ffff880c2fc21840 @../[email protected]/.... ffff880c2fc21850: ffff880c2fc21850 ffff880c2fc21850 P../....P../.... ffff880c2fc21860: ffff880c2fc21860 ffff880c2fc21860 `../....`../.... ffff880c2fc21870: 0000000000000000 0000000000000061 ........a....... ffff880c2fc21880: 00000000058f058f ffff880c2fe78400 .........../.... ffff880c2fc21890: ffff880c2fc20400 00000001030105e7 .../............ ffff880c2fc218a0: 0000000000000000 0000000000000000 ................ ffff880c2fc218b0: 0000000000000000 0000000000000000 ................ ffff880c2fc218c0: ffff880c2fc218c0 ffff880c2fc218c0 .../......./.... ffff880c2fc218d0: ffff88079399db40 ffff880c2fc231c0 @........1./.... ffff880c2fc218e0: ffff8807989ff200 ffff88049dc62780 .........‘...... ffff880c2fc218f0: 0000000000000012 0000000000000021 ........!....... crash> slab ffff880c2fc217e0 struct slab { list = { next = 0xffff880c2fc217e0, prev = 0xffff880c2fc217e0 }, colouroff = 0, s_mem = 0x61, inuse = 114689749, free = 0, nodeid = 34816 } crash> slab ffff880c2fc217c0 struct slab { list = { next = 0xffff880c2fc217c0, prev = 0xffff880c2fc217c0 }, colouroff = 18446612152142391936, s_mem = 0xffff880c2fc23380, inuse = 801249248, free = 4294936588, nodeid = 6112 }
从上面的输出可以看出,nodelists [0]中对应的slabs_free链表为空,slabs_partial 链表为空, 只有 slabs_full 有数据。
slabs_full的地址就是nodelists[i] 的地址偏移0x10,遍历一下:
crash> list -s slab.inuse 0xffff88182fc007d0 >caq.slab_1 crash> list -s slab.inuse 0xffff880c2fc217d0 >caq.slab_0
由于list遇到null会认为结束,所以一开始list没出错,我还以为自己分析的地址有问题,打开我的输出文件才发现,确实slab的list出问题了。要知道,list访问到null或者访问到循环自己,
都会结束。
2035 ffff88049dd9af40
2036 inuse = 1
2037 ffff880799a9e2c0
2038 inuse = 0------------------full链表不可能inuse为0
2039 ffff880bee40a000
2040 inuse = 32768
查看一下内容:
crash> slab ffff880799a9e2c0 struct slab { list = { next = 0xffff880bee40a000, prev = 0x10000000000 }, colouroff = 18446612152141553664, s_mem = 0xb00000100, inuse = 0, free = 0, nodeid = 0 } crash> slab 0xffff880bee40a000 struct slab { list = { next = 0x0,-----------------------null指针出现了。 prev = 0x2185b85600020 }, colouroff = 16384, s_mem = 0x2187025e00020, inuse = 32768, free = 0, nodeid = 32 }
null指针出现了,该slab管理单元的prev已经不可信,所以要找到上一个slab,看ffff880799a9e2c0 ,发现它的数据有问题,才是导致这个oops的根本原因,因为ffff880799a9e2c0 中的内容
不是一个正常的slab,按照next访问的时候,才出现的异常,我们来看一下ffff880799a9e2c0 前后的内容。
ffff880799a9e200: 0000002800000000 006e280a00000000 ....(........(n. ffff880799a9e210: 000005240000230d 20f6cb8000000555 .#..$...U...... ffff880799a9e220: 002c000000002e1c 006e127600000000 ......,.....v.n. ffff880799a9e230: 0000000000000000 0000000000000000 ................ ffff880799a9e240: 0000000000000000 ffff88040000051d ................ ffff880799a9e250: 0000000300000004 ffff880b356be9c0 ..........k5.... ffff880799a9e260: 6664732f7665642f 0000000000000000 /dev/sdf........ ffff880799a9e270: 0000000000000000 0000000000000000 ................ ffff880799a9e280: 0000000000000000 0000000000000000 ................ ffff880799a9e290: 0000000000000000 0000000000000000 ................ ffff880799a9e2a0: 0000000000000000 0000000000000000 ................ ffff880799a9e2b0: 0000000000000000 0000000000000000 ................ ffff880799a9e2c0: ffff880bee40a000 0000010000000000 [email protected]地址的内容如下 ffff880799a9e2d0: ffff88049dcd2000 0000000b00000100 . .............. ffff880799a9e2e0: 0000000000000000 0000000000000000 ................ ffff880799a9e2f0: 0000000000000000 0000000000000000 ................ ffff880799a9e300: 00000000000f0015 30305f3661adaa00 ...........a6_00 ffff880799a9e310: 5f613030305f3031 3130303030303030 10_000a_00000001 ffff880799a9e320: 0020000000002900 0000000000000000 .).... ......... ffff880799a9e330: 0000000000000000 0000000000000000 ................ ffff880799a9e340: 0000000000000000 0000280a00000000 .............(.. ffff880799a9e350: 0000000000000000 0000000000000000 ................ ffff880799a9e360: 0000000000000000 ffffffff81099c20 ........ ....... ffff880799a9e370: 0000000000000000 0000000000000000 ................
看一下ffff880799a9e2c0 本身属于什么数据:
kmem ffff880799a9e2c0 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff880c2fc40100 size-64 64 1099970 1116870 18930 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff880799a9e000 ffff880799a9e140 59 48 11 FREE / [ALLOCATED] [ffff880799a9e2c0] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea001a99d290 799a9e000 0 0 1 40000000000080 slab
由于kmem_cache的slab管理数据和slab的obj可以分离,所以根据 struct kmem_cache 0xffff880c2fd10440 对应的 slabp_cache 成员的值为 0xffff880c2fc40100 ,它也是一个kmem_cache
crash> struct -x kmem_cache 0xffff880c2fc40100 struct kmem_cache { array = {0xffff880c2fe9c000, 0xffff880c1199ec00, 0xffff880c11a3f000, 0xffff880c11a4f400, 0xffff881811e42000, 0xffff881811e85000, 0xffff881811ec1000, 0xffff881811ef4000, 0xffff880c11a99800, 0xffff880c11ae9c00, 0xffff880c11af9000, 0xffff880c11b2f400, 0xffff881811f26000, 0xffff881811f3f000, 0xffff881811f5d000, 0xffff881811f83000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}, batchcount = 0x3c, limit = 0x78, shared = 0x8, buffer_size = 0x40, reciprocal_buffer_size = 0x4000000, flags = 0x42000, num = 0x3b, gfporder = 0x0, gfpflags = 0x0, colour = 0x0, colour_off = 0x40, slabp_cache = 0x0, slab_size = 0x140, dflags = 0x0, ctor = 0x0, name = 0xffffffff817a2435 "size-64", next = { next = 0xffff880c2fc38118, prev = 0xffff880c2fc58198 }, nodelists = {0xffff880c2fc21140, 0xffff88182fc00140, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...} }
crash> struct kmem_list3 0xffff880c2fc217c0 struct kmem_list3 { slabs_partial = { next = 0xffff880c2fc217c0, prev = 0xffff880c2fc217c0 }, slabs_full = { next = 0xffff88049dd9ea80, prev = 0xffff880c2fc23380 }, slabs_free = { next = 0xffff880c2fc217e0, prev = 0xffff880c2fc217e0 }, free_objects = 0,----------------free个数为0 free_limit = 97, colour_next = 0, list_lock = { raw_lock = { slock = 114689749 } }, shared = 0xffff880c2fe78800, alien = 0xffff880c2fc203e0, next_reap = 4345365991, free_touched = 0 }
它是一个size-64的kmem_cache,也就是size 4096的cache的slab的管理数据,其实就是size-64的cache的obj。现在的问题是,这个obj被异常踩了,踩的地址是:ffff880799a9e2c0
crash> search ffff880bee40a000
ffff88049dd70d18: ffff880bee40a000
ffff880799a9e2c0: ffff880bee40a000
ffff880be23cd9c0: ffff880bee40a000
分别rd一下这三个地址,发现 ffff880799a9e2c0 和 ffff880be23cd9c0 中的内容是相同的:
crash> rd ffff880be23cd9c0 8 ffff880be23cd9c0: ffff880bee40a000 0000010000000000 [email protected]可能的源 ffff880be23cd9d0: ffff88049dcd2000 0000000b00000100 . .............. ffff880be23cd9e0: 0000000000000000 0000000000000000 ................ ffff880be23cd9f0: 0000000000000000 0000000000000000 ................ crash> rd ffff880799a9e2c0 8 ffff880799a9e2c0: ffff880bee40a000 0000010000000000 [email protected]被踩的, ffff880799a9e2d0: ffff88049dcd2000 0000000b00000100 . .............. ffff880799a9e2e0: 0000000000000000 0000000000000000 ................ ffff880799a9e2f0: 0000000000000000 0000000000000000 ................
如上所示:两个地址里面的内容一模一样,
crash> kmem ffff880be23cd9c0 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff880c2fc00040 size-32 32 38254 39200 350 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff880be23cd000 ffff880be23cd200 112 66 46 FREE / [ALLOCATED] [ffff880be23cd9c0] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea002997d4d8 be23cd000 0 64 1 40000000000080 slab
看一下这个slab的管理obj的数据的使用情况:
crash> slab ffff880be23cd000 struct slab { list = { next = 0xffff880c0b164000, prev = 0xffff880be6f95000 }, colouroff = 512, s_mem = 0xffff880be23cd200, inuse = 105, free = 32,---------指向第一个free的节点,然后32节点中的数字指向下一个free的节点, nodeid = 0 }
crash> rd -32 0xffff880c0b164030 112 ffff880c0b164030: 00000015 00000002 00000003 00000026 ............&... ffff880c0b164040: 00000023 ffffffff 0000002e 00000006 #............... ffff880c0b164050: 0000002e ffffffff ffffffff 0000001f ................ ffff880c0b164060: 0000000d 0000002a 0000003f 00000016 ....*...?....... ffff880c0b164070: 00000011 0000000f 00000013 0000002a ............*... ffff880c0b164080: 00000036 00000021 0000002c 0000003c 6...!...,...<... ffff880c0b164090: 0000006b 00000018 00000019 00000042 k...........B... ffff880c0b1640a0: 0000001b 0000001c 0000001d ffffffff ................ ffff880c0b1640b0: 0000002a 00000001 00000003 00000013 *............... ffff880c0b1640c0: 00000023 00000024 00000027 0000002b #...$...‘...+... ffff880c0b1640d0: 00000029 00000020 0000003f 0000000b )... ...?....... ffff880c0b1640e0: 00000014 0000002c 00000014 0000000e ....,........... ffff880c0b1640f0: 0000002f ffffffff 00000013 0000002f /.........../... ffff880c0b164100: 0000003b 00000008 0000000b 0000003e ;...........>... ffff880c0b164110: 00000033 00000038 0000003b 00000030 3...8...;...0... ffff880c0b164120: 0000003d 0000003e ffffffff 00000026 =...>.......&... ffff880c0b164130: 00000007 00000040 00000041 00000042 [email protected] ffff880c0b164140: 00000043 00000044 00000045 00000046 C...D...E...F... ffff880c0b164150: 00000047 00000048 00000049 0000004a G...H...I...J... ffff880c0b164160: 0000004b 0000004c 0000004d 0000004e K...L...M...N... ffff880c0b164170: 0000004f 00000050 00000051 00000052 O...P...Q...R... ffff880c0b164180: 00000053 00000054 00000055 00000056 S...T...U...V... ffff880c0b164190: 00000057 00000058 00000059 0000005a W...X...Y...Z... ffff880c0b1641a0: 0000005b 0000005c 0000005d 0000005e [...\...]...^... ffff880c0b1641b0: 0000005f 00000060 00000061 00000062 _...`...a...b... ffff880c0b1641c0: 00000063 00000064 00000065 00000066 c...d...e...f... ffff880c0b1641d0: 00000067 00000068 00000069 0000006a g...h...i...j... ffff880c0b1641e0: 0000006b 0000006c 0000006d ffffffff k...l...m.......
根据上面的数据,也就是这个slab中目前空闲的为32-2a-3f-26-27-2b-b-1f(这个是最后一个,不能算free),也就是7个free,根据slab重的105个in_use的统计,总共是112个,数据是ok的。
我们目前知道这个内存被踩了,但是这个地址不一定是被踩的初始地址,所以,有必要往上找,看哪个地址是被踩的初始地址(当然不排除踩多次)。
先尝试根据双向循环链表,恢复一下原来的链表。
ffff88049dd9af40的next指向了ffff880799a9e2c0,但由于 ffff880799a9e2c0 的地址里面数据是错的,所以 ffff880799a9e2c0 的数据不能用,但是 ffff880799a9e2c0 应该也是它的下一个
元素的prev指针,所以search一下,看谁的内存中有 ffff880799a9e2c0 这个值。
crash> search ffff880799a9e2c0 ffff8800390873c0: ffff880799a9e2c0 ffff88049dd9af40: ffff880799a9e2c0 ffff88079685a948: ffff880799a9e2c0 ffffea00102873c0: ffff880799a9e2c0 crash> kmem ffff8800390873c0 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0000c79d88 39087000 0 0 1 20000000000400 reserved crash> kmem ffff88079685a948 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff880c2fc40100 size-64 64 1099970 1116870 18930 4k SLAB MEMORY TOTAL ALLOCATED FREE ffff88079685a000 ffff88079685a140 59 24 35 FREE / [ALLOCATED] [ffff88079685a940] PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea001a8ed3b0 79685a000 0 0 1 40000000000080 slab crash> kmem ffffea00102873c0 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0000c79d88 39087000 0 0 1 20000000000400 reserved crash> slab ffff88079685a940 struct slab { list = { next = 0xffff88079399a3c0, prev = 0xffff880799a9e2c0 }, colouroff = 0, s_mem = 0xffff8806b875d000, inuse = 1, free = 4294967295, nodeid = 0 }
查到了是 ffff88079685a940 这个prev是 0xffff880799a9e2c0,继续遍历940。
list -s slab.inuse ffff88079685a940 。。。。 ffff880795767240 inuse = 1 ffff880799a9e280 inuse = 0
最终遍历到240,又遇到一个异常的280,需要跟 0xffff880799a9e2c0类似,search找谁的内存中有280这个地址,最终恢复了原链表为:
f40->2c0->940->.....->240->280->ac0->....->f40,形成循环链表。
可以看出,280和2c0在地址上是连续的里面的内容全被踩了,那么踩内存的可能就是从280开始拷贝。
crash> rd ffff880be23cd900 128 ffff880be23cd900: 0000000000000000 0000000000000000 ................ ffff880be23cd910: 0000000000000000 0000000000000000 ................ ffff880be23cd920: 0000000000666473 0000000000000000 sdf............. ffff880be23cd930: 0000000000000000 0000000000000000 ................ ffff880be23cd940: 0000000000000000 ffff88040000051d ................ ffff880be23cd950: 0000000300000004 ffff880b356be9c0 ..........k5.... ffff880be23cd960: 6664732f7665642f 0000000000000000 /dev/sdf........ ffff880be23cd970: 0000000000000000 0000000000000000 ................ ffff880be23cd980: 0000000000000000 0000000000000000 ................--------可能的源 ffff880be23cd990: 0000000000000000 0000000000000000 ................ ffff880be23cd9a0: 0000000000000000 0000000000000000 ................ ffff880be23cd9b0: 0000000000000000 0000000000000000 ................ ffff880be23cd9c0: ffff880bee40a000 0000010000000000 [email protected] ffff880be23cd9d0: ffff88049dcd2000 0000000b00000100 . .............. ffff880be23cd9e0: 0000000000000000 0000000000000000 ................ ffff880be23cd9f0: 0000000000000000 0000000000000000 ................ crash> rd ffff880799a9e200 128 ffff880799a9e200: 0000002800000000 006e280a00000000 ....(........(n. ffff880799a9e210: 000005240000230d 20f6cb8000000555 .#..$...U...... ffff880799a9e220: 002c000000002e1c 006e127600000000 ......,.....v.n. ffff880799a9e230: 0000000000000000 0000000000000000 ................ ffff880799a9e240: 0000000000000000 ffff88040000051d ................------------------已经被分配出去 ffff880799a9e250: 0000000300000004 ffff880b356be9c0 ..........k5.... ffff880799a9e260: 6664732f7665642f 0000000000000000 /dev/sdf........ ffff880799a9e270: 0000000000000000 0000000000000000 ................ ffff880799a9e280: 0000000000000000 0000000000000000 ................-------------目的地址 ffff880799a9e290: 0000000000000000 0000000000000000 ................ ffff880799a9e2a0: 0000000000000000 0000000000000000 ................ ffff880799a9e2b0: 0000000000000000 0000000000000000 ................ ffff880799a9e2c0: ffff880bee40a000 0000010000000000 [email protected] ffff880799a9e2d0: ffff88049dcd2000 0000000b00000100 . .............. ffff880799a9e2e0: 0000000000000000 0000000000000000 ................ ffff880799a9e2f0: 0000000000000000 0000000000000000 ................ ffff880799a9e300: 00000000000f0015 30305f3661adaa00 ...........a6_00 ffff880799a9e310: 5f613030305f3031 3130303030303030 10_000a_00000001
一种可能是我上面分析的,源地址这边memcpy,然后目的地址是我们的9e280,还有一种可能是,两者反过来,因为并不知道到底谁是源,甚至踩多次的情况,也就是拷贝多次。
感觉分析不下去了,这种踩内存不知道怎么分析。数据没有明显的特征。
以上是关于一个未完成的2.6.32-220内核踩内存crash分析记录的主要内容,如果未能解决你的问题,请参考以下文章