std::async 导致死锁?

Posted

技术标签:

【中文标题】std::async 导致死锁?【英文标题】:std::async causes deadlock? 【发布时间】:2019-12-31 15:36:40 【问题描述】:

我试图在繁重的工作负载应用程序中使用 std::async 来提高性能,但我不时遇到死锁。我调试了很长时间,几乎可以肯定我的代码没问题,而且std库似乎有问题。

于是我写了一个简单的测试程序来作证:

#include <iostream>
#include <vector>
#include <algorithm>
#include <numeric>
#include <future>
#include <string>
#include <mutex>
#include <unistd.h>
#include <atomic>
#include <iomanip>

std::atomic_long numbers[6];

void add(std::atomic_long& n)

    ++n;


void func2(std::atomic_long& n)

    for (auto i = 0L; i < 1000000000000L; ++i)
    
        std::async(std::launch::async, [&] add(n););   // Small task, I want to run them simultaneously
    


int main()

    std::vector<std::future<void>> results;
    for (int i = 0; i < 6; ++i)
    
        auto& n = numbers[i];
        results.push_back(std::async(std::launch::async, [&n] func2(n);));
    

    while (true)
    
        sleep(1);
        for (int i = 0; i < 6; ++i)
            std::cout << std::setw(20) << numbers[i] << " ";
        std::cout << std::endl;
    

    for (auto& r : results)
    
        r.wait();
    
    return 0;

这个程序会产生这样的输出:

              763700               779819               754005               763287               767713               748994 
              768822               785172               759678               769393               772956               754469 
              773529               789382               763524               772704               776398               757864 
              778560               794419               768580               777507               781542               762991 
              782056               795578               771704               780554               784865               766162 
              801633               812610               788111               802617               803661               784894 

一段时间(分钟或小时)后,如果出现死锁,输出将是这样的:

             4435337              4452421              4507907              4501378              2549550              4462899 
             4441213              4457648              4514424              4506626              2549550              4468019 
             4446301              4462675              4519272              4511889              2549550              4473266 
             4453940              4470304              4526382              4519513              2549550              4480872 
             4461095              4477708              4533272              4526901              2549550              4488313 
             4470974              4488287              4543442              4537286              2549550              4498733 

第五列被冻结。

一天之后,变成了这样:

            23934912             23967635             24007250             23931203              2549550           3249788689 
            23934912             23967635             24007250             23931203              2549550           3249816818 
            23934912             23967635             24007250             23931203              2549550           3249835009 
            23934912             23967635             24007250             23931203              2549550           3249860262 
            23934912             23967635             24007250             23931203              2549550           3249894331 

除了最后一列之外,几乎所有列都冻结了。看起来很奇怪。

我在 Linux、macOS、FreeBSD 上运行,结果是:

macOS:10.15.2,Clang:11.0.0,无死锁 FreeBSD:12.0, Clang:6.0.1, 死锁 Linux:ubuntu 5.0.0-37,g++:7.4.0,无死锁 Linux:ubuntu 4.4.0-21,Clang:3.8.0,死锁

在gdb中,调用栈是:

(gdb) thread apply all bt

Thread 10 (LWP 100467 of process 37763):
#0  0x000000080025c630 in ?? () from /lib/libthr.so.3
#1  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fff4ad57000

Thread 9 (LWP 100464 of process 37763):
#0  0x000000080046fafa in _umtx_op () from /lib/libc.so.7
#1  0x0000000800264912 in ?? () from /lib/libthr.so.3
#2  0x000000080031f9f9 in std::__1::mutex::unlock() () from /usr/lib/libc++.so.1
#3  0x00000008002e8f55 in std::__1::__assoc_sub_state::set_value() () from /usr/lib/libc++.so.1
#4  0x00000000002053e1 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__execute() ()
#5  0x0000000000205763 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*> >(void*) ()
#6  0x000000080025c776 in ?? () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fff6944a000

Thread 8 (LWP 100431 of process 37763):
#0  0x000000080046fafa in _umtx_op () from /lib/libc.so.7
#1  0x0000000800264912 in ?? () from /lib/libthr.so.3
#2  0x000000080031f9f9 in std::__1::mutex::unlock() () from /usr/lib/libc++.so.1
#3  0x00000008002e8f55 in std::__1::__assoc_sub_state::set_value() () from /usr/lib/libc++.so.1
#4  0x00000000002053e1 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__execute() ()
#5  0x0000000000205763 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >*> >(void*) ()
#6  0x000000080025c776 in ?? () from /lib/libthr.so.3
#7  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffc371a000

Thread 7 (LWP 100657 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x000000000020346b in func2(std::__1::atomic<long>&) ()
#7  0x0000000000206f18 in main::$_1::operator()() const ()
#8  0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#9  0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#10 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#11 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#12 0x000000080025c776 in ?? () from /lib/libthr.so.3
#13 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf5f9000

Thread 6 (LWP 100656 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7  0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8  0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9  0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000

Thread 5 (LWP 100655 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7  0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8  0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9  0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf9fb000

Thread 4 (LWP 100654 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7  0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8  0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9  0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000

Thread 3 (LWP 100653 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7  0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8  0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9  0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000

Thread 2 (LWP 100652 of process 37763):
#0  0x000000080026a66c in ?? () from /lib/libthr.so.3
#1  0x000000080025e731 in ?? () from /lib/libthr.so.3
#2  0x0000000800268388 in ?? () from /lib/libthr.so.3
#3  0x000000080032de72 in std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) () from /usr/lib/libc++.so.1
#4  0x00000008002e971b in std::__1::__assoc_sub_state::wait() () from /usr/lib/libc++.so.1
#5  0x0000000000205389 in std::__1::__async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >::__on_zero_shared() ()
#6  0x0000000000207a22 in std::__1::__release_shared_count::operator()(std::__1::__shared_count*) ()
#7  0x00000000002044f4 in std::__1::future<void> std::__1::__make_async_assoc_state<void, std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0> >(std::__1::__async_func<func2(std::__1::atomic<long>&)::$_0>&&) ()
#8  0x00000000002035ea in std::__1::future<std::__1::__invoke_of<std::__1::decay<func2(std::__1::atomic<long>&)::$_0>::type>::type> std::__1::async<func2(std::__1::atomic<long>&)::$_0>(std::__1::launch, func2(std::__1::atomic<long>&)::$_0&&) ()
#9  0x0000000000203462 in func2(std::__1::atomic<long>&) ()
#10 0x0000000000206f18 in main::$_1::operator()() const ()
#11 0x0000000000206eed in void std::__1::__async_func<main::$_1>::__execute<>(std::__1::__tuple_indices<>) ()
#12 0x0000000000206ea5 in std::__1::__async_func<main::$_1>::operator()() ()
#13 0x0000000000206df3 in std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::__execute() ()
#14 0x0000000000207183 in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >::*)(), std::__1::__async_assoc_state<void, std::__1::__async_func<main::$_1> >*> >(void*) ()
#15 0x000000080025c776 in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000

Thread 1 (LWP 100148 of process 37763):
#0  0x00000008004f984a in _nanosleep () from /lib/libc.so.7
#1  0x000000080025f17c in ?? () from /lib/libthr.so.3
#2  0x000000080045fe0b in sleep () from /lib/libc.so.7
#3  0x0000000000203b7b in main ()

好像很多线程卡在std::__1::condition_variable::wait上,不合理,在测试代码中,根本没有使用任何条件。

谁能告诉我,是我做错了还是 std 库中有错误?


谢谢。这个例子并没有完全模仿我的程序的实际行为。我把它简化得太多了。

现在我添加未来的向量,这更像是:

void func2(std::atomic_long& n)

    std::vector<std::future<void>> rs;
    for (auto i = 0L; i < 1000000000000L; ++i)
    
        rs.push_back(std::async(std::launch::async, [&] add(n);));
    
    for (auto& r : rs)
    
        r.wait();
    

但它仍然得到相同的结果: 在 macOS 上,没问题。

            29693311             29904143             29994992             29856976             30020535             29832796 
            29709344             29917687             30005488             29875611             30039727             29848932 
            29725334             29930826             30019428             29892350             30056678             29866293 
            29737403             29948258             30036760             29904964             30074102             29883648 
            29746597             29965134             30050115             29914459             30086189             29900767 
            29761543             29977363             30066833             29929475             30101723             29915059 
            29777678             29993381             30084101             29949095             30117847             29926040 
            29794253             30007301             30102985             29972819             30129613             29939935 

在freebsd上,又死机了:

               34079                29595                38239               508788                30194                41242 
               34079                29595                38239               509103                30194                41242 
               34079                29595                38239               509583                30194                41242 
               34079                29595                38239               509808                30194                41242 
               34079                29595                38239               510187                30194                41242 
               34079                29595                38239               510543                30194                41242 
               34079                29595                38239               510932                30194                41242 
               34079                29595                38239               511616                30194                41242 
               34079                29595                38239               512111                30194                41242 
               34079                29595                38239               512952                30194                41242 
               34079                29595                38239               514032                30194                41242 
               34079                29595                38239               514205                30194                41242 
               34079                29595                38239               514577                30194                41242 

【问题讨论】:

注意std::async(std::launch::async, [&amp;] add(n););不是异步的,因为返回值被忽略为explained here。 这几乎肯定不是标准库实现中的错误。 是的,听起来您可能只是想启动一些线程? 我不明白您为什么希望您的 async 员工永远工作。他们调用最终完成的提供的函数。你看到的僵局可能只是工人已经完成了。 @FrançoisAndrieux 这些工人不应该在计数器一直递增到1000000000000L之前完成。 【参考方案1】:

您没有考虑std::async 的返回值,返回的未来将阻止任何执行,直到您以std::async 开始的任务结束。编写的这个程序并没有按照您的预期执行。 此外,您正在使用递归调用 std::async 并且它没有被授予它将产生一个新线程,它可以管理一个池,因此如果池很忙,您的程序显然可以冻结,因为您正在执行的循环非常长。如果你想要更多的控制,你可以使用 std::thread 和 std::packaged_task

【讨论】:

谢谢。是的,好点。该程序没有像我预期的那样工作。实际上,它是完全同步的。关键是它,如果它是同步的,它不应该冻结。

以上是关于std::async 导致死锁?的主要内容,如果未能解决你的问题,请参考以下文章

基于std::mutex std::lock_guard std::condition_variable 和std::async实现的简单同步队列

使用 std::async 创建的线程进行 MPI 发送的线程安全

导致死锁错误的 SQL 查询

SQL 更新导致死锁

TimerThread 是不是导致死锁发生?

assetResourcesForAsset:导致死锁