死锁问题分析的利器——valgrind的DRD和Helgrind

Posted breaksoftware

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了死锁问题分析的利器——valgrind的DRD和Helgrind相关的知识,希望对你有一定的参考价值。

        在《DllMain中不当操作导致死锁问题的分析--死锁介绍》一文中,我们介绍了死锁产生的原因。一般来说,如果我们对线程同步技术掌握不牢,或者同步方案混乱,极容易导致死锁。本文我们将介绍如何使用valgrind排查死锁问题。(转载请指明出于breaksoftware的csdn博客)

        构造一个场景

#include <pthread.h>

pthread_mutex_t s_mutex_a;
pthread_mutex_t s_mutex_b;
pthread_barrier_t s_barrier;

void lock() 
    pthread_mutex_lock(&s_mutex_b);
    
        pthread_barrier_wait(&s_barrier);

        pthread_mutex_lock(&s_mutex_a);
        pthread_mutex_unlock(&s_mutex_a);
    
    pthread_mutex_unlock(&s_mutex_b);


static void* thread_routine(void* arg) 
    pthread_mutex_lock(&s_mutex_a);
    
        pthread_barrier_wait(&s_barrier);

        pthread_mutex_lock(&s_mutex_b);
        pthread_mutex_unlock(&s_mutex_b);
    
    pthread_mutex_unlock(&s_mutex_a);


int main(int argc, char** argv) 
    pthread_t tid;

    pthread_mutex_init(&s_mutex_a, 0);
    pthread_mutex_init(&s_mutex_b, 0);
    pthread_barrier_init(&s_barrier, 0, 2);

    pthread_create(&tid, 0, &thread_routine, 0);

    lock();

    pthread_join(tid, 0);
    pthread_cancel(tid);

    pthread_barrier_destroy(&s_barrier);
    pthread_mutex_destroy(&s_mutex_a);
    pthread_mutex_destroy(&s_mutex_b);

    return 0;

        这段代码我们只要关注lock和thread_routine两个方法。

        lock方法在主线程中执行,它先给s_mutex_b上锁,然后通过屏障s_barrier等待线程也执行到屏障处(第21行)。

        thread_routine是线程函数,它先给s_mutex_a上锁,然后通过屏障s_barrier等待主线程也执行到屏障处(第10行)。

        主线程和子线程都执行到屏障处后,屏障被打开,它们继续向下执行:主线程执行到第12行试图获取s_mutex_a;子线程执行到第23行试图获取s_mutex_b。由于这两个互斥量已经被占用,所以产生死锁。

        这是通过代码分析出来的,但是对于比较大的工程项目,我们则需要通过工具来分析。下面我们使用valgrind来分析

valgrind --tool=drd --trace-mutex=yes ./dead_lock

        我们使用上面指令,让valgrind把互斥量相关的信息给打印出来

==4749== [1] mutex_init      mutex 0x30a040
==4749== [1] mutex_init      mutex 0x30a0a0
==4749== [1] mutex_init      mutex 0x1ffefffe10
==4749== [1] mutex_ignore_ordering mutex 0x1ffefffe10
==4749== [1] mutex_trylock   mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 0
==4749== [1] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock   mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] post_mutex_lock mutex 0x1ffefffe10 rc 0 owner 1
==4749== [2] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [2] mutex_trylock   mutex 0x30a040 rc 0 owner 0
==4749== [2] post_mutex_lock mutex 0x30a040 rc 0 owner 0
==4749== [1] cond_post_wait  mutex 0x1ffefffe10 rc 0 owner 2
==4749== [1] mutex_unlock    mutex 0x1ffefffe10 rc 1
==4749== [1] mutex_destroy   mutex 0x1ffefffe10 rc 0 owner 1
==4749== [1] mutex_trylock   mutex 0x30a0a0 rc 0 owner 0
==4749== [1] post_mutex_lock mutex 0x30a0a0 rc 0 owner 0
==4749== [1] mutex_trylock   mutex 0x30a040 rc 1 owner 2
==4749== [2] mutex_trylock   mutex 0x30a0a0 rc 1 owner 1

        第18行显示线程1试图给0x30a040互斥量上锁,但是该互斥量的所有者(owner)是线程2。

        第19行显示线程2试图给0x30a0a0互斥量上锁,但是该互斥量的所有者(owner)是线程1。

        如此我们便可以确定这段程序卡住是因为死锁导致的。

        但是DRD有个问题,不能指出发生死锁的位置。这个时候Helgrind该出场了。

valgrind --tool=helgrind ./dead_lock 

        helgrind执行时,如果发生死锁,需要ctrl+C来终止运行,于是可以得到如下结果

==5373== Process terminating with default action of signal 2 (SIGINT)
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A11: lock (dead_lock.c:12)
==5373==    by 0x108AF4: main (dead_lock.c:38)
==5373== ---Thread-Announcement------------------------------------------
==5373== 
==5373== Thread #2 was created
==5373==    at 0x518287E: clone (clone.S:71)
==5373==    by 0x4E49EC4: create_thread (createthread.c:100)
==5373==    by 0x4E49EC4: pthread_create@@GLIBC_2.2.5 (pthread_create.c:797)
==5373==    by 0x4C36A27: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108AEA: main (dead_lock.c:36)
==5373== 
==5373== ----------------------------------------------------------------
==5373== 
==5373== Thread #2: Exiting thread still holds 1 lock
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A5C: thread_routine (dead_lock.c:23)
==5373==    by 0x4C36C26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x4E496DA: start_thread (pthread_create.c:463)
==5373==    by 0x518288E: clone (clone.S:95)
==5373== 
==5373== ---Thread-Announcement------------------------------------------
==5373== 
==5373== Thread #1 is the program's root thread
==5373== 
==5373== ----------------------------------------------------------------
==5373== 
==5373== Thread #1: Exiting thread still holds 1 lock
==5373==    at 0x4E5310D: __lll_lock_wait (lowlevellock.S:135)
==5373==    by 0x4E4C022: pthread_mutex_lock (pthread_mutex_lock.c:78)
==5373==    by 0x4C33FD6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==5373==    by 0x108A11: lock (dead_lock.c:12)
==5373==    by 0x108AF4: main (dead_lock.c:38)

        第22和37行分别显示子线程和主线程在中断之前,都锁在哪行,这样就更容易定位问题了。

以上是关于死锁问题分析的利器——valgrind的DRD和Helgrind的主要内容,如果未能解决你的问题,请参考以下文章

如何在 Linux (redhat) 上通过 gdb 在 xterm 中使用 valgrind?

调试 boost::thread 应用,误报率高

Qt Creator Valgrind内存分析前端(分析Nginx内存)

如何在 Valgrind 上设置分析的开始和结束

Valgrind:致命错误:memcheck.h:没有这样的文件或目录

Linux下利用Valgrind工具进行内存泄露检测和性能分析