pthread_cond_destroy() 挂起的奇怪行为

Posted

技术标签:

【中文标题】pthread_cond_destroy() 挂起的奇怪行为【英文标题】:strange behavior of pthread_cond_destroy() hanging 【发布时间】:2017-06-24 12:54:56 【问题描述】:

我知道pthread_cancel() 很棘手。我问这个问题是为了了解使用pthread_cancel() 的软件中的一个错误。

我将问题简化为以下代码:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

static pthread_mutex_t notify_mutex;
static pthread_cond_t notify;

static void *_watcher_thread(void *arg)

    (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    (void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

    printf("watcher:   thread started\n");

    while (1) 
            if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) 
                    perror("failed to disable watcher thread cancel: ");
            
            pthread_mutex_lock(&notify_mutex);
            pthread_cond_wait(&notify, &notify_mutex);
            pthread_mutex_unlock(&notify_mutex);
            (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    
    return NULL;


static void *_timer_thread(void *args)

    (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    (void) pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

    printf("timer:   thread started\n");

    while (1) 
            if (pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL) != 0) 
                    perror("failed to disable timer thread cancel: ");
            
            pthread_mutex_lock(&notify_mutex); /* XXX: not a cancellation point */
            pthread_cond_signal(&notify);
            pthread_mutex_unlock(&notify_mutex);
            (void) pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
    
    return NULL;


int main(void)

    pthread_t watcher_tid, timer_tid;
    pthread_attr_t attr;
    long i = 0;

    while (1) 
            pthread_cond_init(&notify, NULL);
            pthread_mutex_init(&notify_mutex, NULL);
            pthread_attr_init(&attr);

            if (pthread_create(&watcher_tid, &attr,
                               &_watcher_thread, NULL)) 
                    perror("failed to create watcher thread: ");
            
            if (pthread_create(&timer_tid, &attr,
                               &_timer_thread, NULL)) 
                    perror("failed to create timer thread: ");
            

            sleep(1);

            printf("main:   to cancel watcher thread\n");
            pthread_cancel(watcher_tid);
            pthread_join(watcher_tid, NULL);
            printf("main:   watcher thread canceled\n");

            printf("main:   to cancel timer thread\n");
            pthread_cancel(timer_tid);
            pthread_join(timer_tid, NULL);
            printf("main:   timer thread canceled\n");

            pthread_cond_destroy(&notify);
            pthread_mutex_destroy(&notify_mutex);
            pthread_attr_destroy(&attr);
            i ++;
            printf("iteration: %ld\n", i);
    

    return 0;

基本上有三个线程:watcher、timer和main。定时器线程周期性地唤醒观察者线程做一些工作。最后主线程终止其他线程并退出。我在上面的测试程序中写了一些循环来重现问题。

在Linux下编译运行程序(debian testing, 4.9.0-3-amd64 #1 SMP, glibc-2.24),经过一些迭代就会挂起:

...
main:   to cancel timer thread
main:   timer thread canceled
iteration: 4
timer:   thread started
watcher:   thread started
main:   to cancel watcher thread
main:   watcher thread canceled
main:   to cancel timer thread
main:   timer thread canceled
iteration: 5
timer:   thread started
watcher:   thread started
main:   to cancel watcher thread
main:   watcher thread canceled
main:   to cancel timer thread
main:   timer thread canceled

gdb 显示挂起程序的堆栈跟踪:

(gdb) attach 29247
Attaching to process 29247
Reading symbols from /home/hjcao/temp/test/pthread/hang1...done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1...(no debugging symbols found)...done.
0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f7960708eb5 in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x0000561b1f194f01 in main () at hang1.c:78
(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7f7960b12700 (LWP 29247) "hang1" 0x00007f796070bf2b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) 

================================================ ========

我的问题是:我不明白为什么主线程会挂在pthread_cond_destroy()

确实,原始程序(名为 hang0)在观察程序/计时器线程的 while 循环中没有 pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL)pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) 调用。它将挂在主线程中,这是可以理解的:异步取消 watcher/timer 线程可能会导致线程在 pthread_cond_wait()/pthread_cond_notify() 执行期间被取消,并导致条件变量 notify 内部混乱。我添加了pthread_setcancelstate() 调用以防止在操作条件变量时取消观察者/计时器线程。但是新程序(名为 hang1)仍然挂起。

有人可以帮我解释一下吗?

【问题讨论】:

具体是怎么编译的? 异步取消大多是个坏主意。您想对所有 POSIX-Threads-API 调用应用完整错误检查并将了解更多信息。 如果取消一个线程,延迟取消是首选方式。 为了让您的示例使用延迟取消,线程的while-loops 需要引入一个取消点。一个简单的sleep(0) 就可以了。 【参考方案1】:

我认为这个帖子会有所帮助: pthread conditions and process termination (Gusev Petr 的回答帮助我解决了我的问题)

我在pthread_cond_destroy() 函数中遇到了同样的条件变量问题。

这主要是因为条件变量没有逻辑来确定它一直在等待的线程是否仍在运行或已死(通常是由于pthread_cancel())。因此,一种可能的解决方案是强制将变量中的值更改为 0,如上述链接中所述。

【讨论】:

以上是关于pthread_cond_destroy() 挂起的奇怪行为的主要内容,如果未能解决你的问题,请参考以下文章

销毁一个孤立的进程共享条件变量

Kotlin 协程协程的挂起和恢复 ② ( 协程挂起 和 线程阻塞 对比 )

Kotlin 协程协程的挂起和恢复 ② ( 协程挂起 和 线程阻塞 对比 )

linux 挂起进程

虚拟机 vmware 自动挂起

activiti 挂起任务能查到吗