性能分析之 GDB 调试 C++ 应用去分析 core dump

Posted zuozewei

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了性能分析之 GDB 调试 C++ 应用去分析 core dump相关的知识,希望对你有一定的参考价值。

背景说明

这个内容只是为了做个记录。
因为项目中有出现 coredump 的情况。

问题分析

先用 GDB 调起来。

[app@主机A bin]$ gdb PROGRAM core.31018

下面是一连串的 GDB 信息。

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...

上面这段话的意思是,随便用,没毛病。

Reading symbols from /bin/PROGRAM...done.
[New LWP 31018]
[New LWP 31027]
[New LWP 31022]
[New LWP 31036]
[New LWP 31038]
[New LWP 31041]
[New LWP 31044]
[New LWP 31047]
[New LWP 31042]
[New LWP 31032]
[New LWP 31033]
[New LWP 31034]
[New LWP 31035]
[New LWP 31037]
[New LWP 31020]
[New LWP 31026]
[New LWP 31031]
[New LWP 31030]
[New LWP 31040]
[New LWP 31039]
[New LWP 31046]
[New LWP 31045]
[New LWP 31043]
[New LWP 31019]
[New LWP 31025]
[New LWP 31024]
[New LWP 31023]
[New LWP 31021]
[New LWP 31029]
[New LWP 31028]

上面是 LWP 编号,也就是我们常说的线程号,在 linux 中线程就是 LWP,有人说,LWP 不是线程,而是进程。因为是 light-weight process 嘛,肯定是进程,是的,又不是 thread,确实它是叫做轻量级进程。但是在 linux中,除了它其他的也没有线程了。看一下 WIKI 上说的:

In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs - allowing multitasking to be done at the user level, which can have some performance benefits.

看了半天,也不知道所以然是啥对吧。那就对了,不用纠结,来跟我一起说,计较那么多概念干吗,这个东西就是线程!

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

上面是说 debug 用的是啥子库。

Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A'.
Program terminated with signal 6, Ab

这里列出来了是怎么产生的 core。 这里有信号 6. 中止。 系统有多少信号呢?
大概是下面这么多。

信号处理动作发出信号的原因标准
SIGHUP1A终端挂起或者控制进程终止POSIX.1
SIGINT2A键盘中断(如break键被按下)POSIX.1
SIGQUIT3C键盘的退出键被按下POSIX.1
SIGILL4C非法指令POSIX.1
SIGABRT6C由abort(3)发出的退出指令POSIX.1
SIGFPE8C浮点异常POSIX.1
SIGKILL9AEFKill信号POSIX.1
SIGSEGV11C无效的内存引用POSIX.1
SIGPIPE13A管道破裂:写一个没有读端口的管道POSIX.1
SIGALRM14A由alarm(2)发出的信号POSIX.1
SIGTERM15A终止信号POSIX.1
SIGUSR130,10,16A用户自定义信号1POSIX.1
SIGUSR231,12,17A用户自定义信号2POSIX.1
SIGCHLD20,17,18B子进程结束信号POSIX.1
SIGCONT19,18,25进程继续(曾被停止的进程)POSIX.1
SIGSTOP17,19,23DEF终止进程POSIX.1
SIGTSTP18,20,24D控制终端(tty)上按下停止键POSIX.1
SIGTTIN21,21,26D后台进程企图从控制终端读POSIX.1
SIGTTOU22,22,27D后台进程企图从控制终端写POSIX.1
SIGBUS10,7,10C总线错误(错误的内存访问)SUSv2
SIGPOLLASysV定义的Pollable事件,与SIGIO同义SUSv2
SIGPROF27,27,29AProfiling定时器到SUSv2
SIGSYS12,-,12C无效的系统调用(SVID)SUSv2
SIGTRAP5C跟踪/断点捕获SUSv2
SIGURG16,23,21BSocket出现紧急条件(4.2BSD)SUSv2
SIGVTALRM26,26,28A实际时间报警时钟信号(4.2BSD)SUSv2
SIGXCPU24,24,30C超出设定的CPU时间限制(4.2BSD)SUSv2
SIGXFSZ25,25,31C超出设定的文件大小限制(4.2BSD)SUSv2
SIGIOT6CIO捕获指令,与SIGABRT同义
SIGEMT7,-,7
SIGSTKFLT-,16,-A协处理器堆栈错误
SIGIO23,29,22A某I/O操作现在可以进行了(4.2 BSD)
SIGCLD-,-,18A与SIGCHLD同义
SIGPWR29,30,19A电源故障(System V)
SIGINFO29,-,-A与SIGPWR同义
SIGLOST-,-,-A文件锁丢失
SIGWINCH28,28,20B窗口大小改变(4.3 BSD,Sun)
SIGUNUSED-,31,-A未使用的信号(will be SIGSYS)

那上面的处理动作是什么意思呢?

_A 缺省的动作是终止进程 _
_B 缺省的动作是忽略此信号 _
_C 缺省的动作是终止进程并进行内核映像转储(dump core) _
_D 缺省的动作是停止进程 _
_E 信号不能被捕获 _
_F 信号不能被忽略 _

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64

上面这些是引用了一系列的东西来 debug这个 core 文件。要是换了个机器说不定 core 的。要是换了个机器说不定 core 的内容都看不到了呢(我猜的,我并没有那么闲,真的换个机器试一下)。

查看断点。

(gdb) bt

#0  0x00007fa1fef385f7 in raise () from /lib64/libc.so.6
#1  0x00007fa1fef39ce8 in abort () from /lib64/libc.so.6
#2  0x00007fa1fef78317 in __libc_message () from /lib64/libc.so.6
#3  0x00007fa1fef7e184 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fa1fef818e7 in _int_malloc () from /lib64/libc.so.6
#5  0x00007fa1fef828dc in malloc () from /lib64/libc.so.6
#6  0x000000000043a147 in CMemPool::frealloc (ud=0x0, ptr=0x0, osize=0, nsize=64, p=0x1a8a450) at MemPool.h:266
#7  0x0000000000434898 in luaM_realloc_ (L=0x1b344e0, block=0x0, osize=0, nsize=64) at lmem.cpp:79
#8  0x000000000043b481 in luaH_new (L=0x1b344e0, narray=0, nhash=0) at ltable.cpp:359
#9  0x000000000042cbf8 in lua_createtable (L=0x1b344e0, narray=0, nrec=0) at lapi.cpp:582
#10 0x00007fa1fecf0f76 in getMessage (l=0x1b344e0, pMessage=0x7fa1bc0008c0) at message.h:218
#11 0x00007fa1fecf3af6 in getResponse (l=0x1b344e0, res=0x1b0d6d0) at service.cpp:28
#12 0x00007fa1fecf3d3b in sendM (l=0x1b344e0) at service.cpp:59
#13 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b247b0, nresults=2) at ldo.cpp:319
#14 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=1) at lvm.cpp:590
#15 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24740, nResults=-1) at ldo.cpp:377
#16 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9db20) at lapi.cpp:801
#17 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9db20) at ldo.cpp:116
#18 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9db20, old_top=64, ef=0) at ldo.cpp:464
#19 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822
#20 0x000000000044f074 in luaB_pcall (L=0x1b344e0) at lbaselib.cpp:466
#21 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b24730, nresults=2) at ldo.cpp:319
#22 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=2) at lvm.cpp:590
#23 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24710, nResults=-1) at ldo.cpp:377
#24 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9e230) at lapi.cpp:801
#25 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb <f_call(lua_State*, void*)>, ud=0x7ffeb1c9e230) at ldo.cpp:116
#26 0x00000000004314a3 in luaD_pcall (L=0x1b344e0, func=0x42d3eb <f_call(lua_State*, void*)>, u=0x7ffeb1c9e230, old_top=16, ef=0) at ldo.cpp:464
#27 0x000000000042d4c9 in lua_pcall (L=0x1b344e0, nargs=0, nresults=-1, errfunc=0) at lapi.cpp:822
#28 0x0000000000426951 in process () at srv.cpp:120
#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107
#30 0x00000000004bad36 in _svcdsp ()
#31 0x00000000004a3b4c in _runserver ()
#32 0x00000000004a2a22 in _main ()
#33 0x00000000004265f0 in main ()

上面这条就是告诉你这个 core 文件 dump 点是在哪里,调用关系从下到上。这里面看到的问题点基本上都是底层的调用。而这些底层的调用也只是表现,最重要的是上层的变量是怎么传的。

闲着没事,看下所有线程的当前断点。

(gdb) info threads

  Id   Target Id         Frame
  30   Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29   Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28   Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27   Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26   Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25   Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24   Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23   Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22   Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  21   Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20   Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19   Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18   Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17   Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  16   Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15   Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14   Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6
  13   Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12   Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11   Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10   Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9    Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5    Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6

(gdb)

大部分都在 wait/timewait 之类的,也没啥毛病。

尝试打印下变量:

(gdb) p req
No symbol "req" in current context.

怎么没有符号表?
切一下frame。

(gdb) frame 29
#29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107

(gdb) p req
$1 = (SVCINFO *) 0x7ffeb1c9e340

可以看到这个变量的定义和值。有人说,这玩意是地址怎么看?
其实有源码就什么都能看得到的。只是这里没有加载进来。
GDB 默认搜索当前目录,但是也没搜索到。
编译的时候是会记录源码位置的,但是因为这个主机上没有,所以看不到。

如果有兴趣玩的话,可以自己写一段把源码放一起,看看效果。

以上是关于性能分析之 GDB 调试 C++ 应用去分析 core dump的主要内容,如果未能解决你的问题,请参考以下文章

性能分析之 GDB 动态修改内存变量值(C/C++)

使用gdb调试c++程序

性能工具之调试工具 GDB(你以为性能分析中用不到吗?)

GDB调试工具总结

性能分析之C++ core dump分析

gdb 调试远程核心转储