如何捕捉并分析SIGSEGV的现场

Posted 2020-08-17 thammer

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了如何捕捉并分析SIGSEGV的现场相关的知识，希望对你有一定的参考价值。

　　linux下程序对SIGSEGV信号的默认处理方式是产生coredump并终止程序，可以参考man 7 signal

       Signal     Value     Action   Comment
       ──────────────────────────────────────────────────────────────────────
       SIGHUP        1       Term    Hangup detected on controlling terminal
                                     or death of controlling process
       SIGINT        2       Term    Interrupt from keyboard
       SIGQUIT       3       Core    Quit from keyboard
       SIGILL        4       Core    Illegal Instruction
       SIGABRT       6       Core    Abort signal from abort(3)
       SIGFPE        8       Core    Floating point exception
       SIGKILL       9       Term    Kill signal
       SIGSEGV      11       Core    Invalid memory reference
       SIGPIPE      13       Term    Broken pipe: write to pipe with no
                                     readers
       SIGALRM      14       Term    Timer signal from alarm(2)
       SIGTERM      15       Term    Termination signal
       SIGUSR1   30,10,16    Term    User-defined signal 1
       SIGUSR2   31,12,17    Term    User-defined signal 2
       SIGCHLD   20,17,18    Ign     Child stopped or terminated
       SIGCONT   19,18,25    Cont    Continue if stopped
       SIGSTOP   17,19,23    Stop    Stop process
       SIGTSTP   18,20,24    Stop    Stop typed at terminal
       SIGTTIN   21,21,26    Stop    Terminal input for background process
       SIGTTOU   22,22,27    Stop    Terminal output for background process

　　对于Action的描述

       The entries in the "Action" column of the tables below specify the
       default disposition for each signal, as follows:

       Term   Default action is to terminate the process.

       Ign    Default action is to ignore the signal.

       Core   Default action is to terminate the process and dump core (see
              core(5)).

       Stop   Default action is to stop the process.

       Cont   Default action is to continue the process if it is currently
              stopped.

　　可以看到产生core这个动作的信号不止SIGSEGV这一个。通常程序中有对内存的Invalid reference就会产生SIGSEGV，具体描述见http://www.cnblogs.com/thammer/p/4737371.html 。

　　分析段错误的方法：

　　　　1.直接使用gdb

　　　　　　如果是容易重现的SIGSEGV直接gdb挂着运行，产生SIGSEGV时gdb会停止并打印提示，这时直接敲入命令bt查看程序此时的函数调用栈帧就可以定位到是哪个函数在什么样的调用情况下出现段错误。

　　　　2.使用core文件+gdb

　　　　　　在程序收到SIGSEGV时会产生coredump，core文件就是异常进程在发生异常的那一个时刻的进程内存上下文和cpu寄存器的信息。

　　　　　　首先，设置core文件大小 ulimit -c XXXX，XXXX就是允许产生的core文件大小，通常设置为unlimited，不限定大小

　　　　　　然后，运行程序直至产生core文件，名字一般是core.xxx，xxx为程序进程号，不同发行版本可能有不同的命名规则

　　　　　　然后，运行gdb,敲入命令 core-file corefile-name,再bt即可

　　　　3.注册SIGSEGV信号处理函数，在处理函数里面使用一些堆栈回溯的函数打印栈帧信息。

　　　　　　A.使用glibc带的函数backtrace backtrace_symbols backtrace_symbols_fd打印

       void SigSegv_handler(int signo)
       {
           int j, nptrs;
           void *buffer[BT_BUF_SIZE];
           char **strings;

           nptrs = backtrace(buffer, BT_BUF_SIZE);
           printf("backtrace() returned %d addresses\\n", nptrs);

           /* The call backtrace_symbols_fd(buffer, nptrs, STDOUT_FILENO)
              would produce similar output to the following: */

           strings = backtrace_symbols(buffer, nptrs);
           if (strings == NULL) {
               perror("backtrace_symbols");
               exit(EXIT_FAILURE);
           }

           for (j = 0; j < nptrs; j++)
               printf("%s\\n", strings[j]);

           free(strings);
　　　　　　 exit(-1);
       }

　　　　　backtrace_symbols 和backtrace_symbols_fd不同在于后者将打印输入到一个fd指定的文件里面。

　　　　　它有一定的限制：

       These functions make some assumptions about how a function\'s return
       address is stored on the stack.  Note the following:

       *  Omission of the frame pointers (as implied by any of gcc(1)\'s
          nonzero optimization levels) may cause these assumptions to be
          violated.

       *  Inlined functions do not have stack frames.

       *  Tail-call optimization causes one stack frame to replace another.

       The symbol names may be unavailable without the use of special linker
       options.  For systems using the GNU linker, it is necessary to use
       the -rdynamic linker option.  Note that names of "static" functions
       are not exposed, and won\'t be available in the backtrace.

　　　　　　对优化的程序可能失效

　　　　　　对inline函数失效

　　　　　　对static函数仅能打印函数地址

　　　　　　对tail-call优化的函数失效

　　　　　　编译时需要加入 -rdynamic

B.还有其他方法或接口做类似backtrace的事情，以后补充

以上是关于如何捕捉并分析SIGSEGV的现场的主要内容，如果未能解决你的问题，请参考以下文章