为啥在线程中使用 system() 时，多线程 C 程序在 Mac OS X 上被强制使用单个 CPU？

Posted 2023-02-17

技术标签:

【中文标题】为啥在线程中使用 system() 时，多线程 C 程序在 Mac OS X 上被强制使用单个 CPU？【英文标题】：Why is a multithreaded C program forced to a single CPU on Mac OS X when system() is used in a thread?为什么在线程中使用 system() 时，多线程 C 程序在 Mac OS X 上被强制使用单个 CPU？ 【发布时间】：2015-07-01 10:36:53 【问题描述】：

我发现在 Linux 和 Mac OS X 之间使用 pthread 的程序的行为有一个奇怪的差异。

考虑以下可以用“gcc -pthread -o threadtest threadtest.c”编译的程序：

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

static
void *worker(void *t)

    int i = *(int *)t;

    printf("Thread %d started\n", i);
    system("sleep 1");

    printf("Thread %d ends\n", i);
    return (void *) 0;


int main()

#define N_WORKERS   4

    pthread_t       workers[N_WORKERS];
    int                 args[N_WORKERS];
    int         i;

    for (i = 0; i < N_WORKERS; ++i)
    
        args[i] = i;
        pthread_create(&workers[i], NULL, worker, args + i);
    

    for (i = 0; i < N_WORKERS; ++i)
    
        pthread_join(workers[i], NULL);
    

    return 0;

在 4 核 Mac OS X 机器上运行生成的可执行文件会导致以下行为：

$ time ./threadtest
Thread 0 started
Thread 2 started
Thread 1 started
Thread 3 started
Thread 0 ends
Thread 1 ends
Thread 2 ends
Thread 3 ends

real    0m4.030s
user    0m0.006s
sys 0m0.008s

请注意，实际内核的数量可能甚至不相关，因为时间只是花在“sleep 1”shell 命令中而没有任何计算。很明显，线程是并行启动的，因为在程序启动后会立即出现“Thread ... started”消息。

在 Linux 机器上运行相同的测试程序会得到我期望的结果：

$ time ./threadtest
Thread 0 started
Thread 3 started
Thread 1 started
Thread 2 started
Thread 1 ends
Thread 2 ends
Thread 0 ends
Thread 3 ends

real    0m1.010s
user    0m0.008s
sys 0m0.013s

四个进程并行启动，每个进程休眠一秒钟，大约需要一秒钟。

如果我将实际计算放入 worker() 函数并删除 system() 调用，我会在 Mac OS X 中看到预期的加速。

所以问题是，为什么在线程中使用 system() 调用可以有效地序列化 Mac OS X 上的线程执行，以及如何防止这种情况发生？

【问题讨论】：

也许 MacOSX 上的标准 C 库可能是免费软件，所以你可以瞥见他们对 system 的实现（我可能猜他们正在使用一些全局互斥锁，但我不明白为什么） .否则，选择system 的一些免费软件实现 @BasileStarynkevitch Not sure if this is the right source file 但里面有一个互斥锁。一般来说，从线程内部调用 C 库函数是一个非常糟糕的主意。来自 C11 标准（支持线程）：

The functions in the standard library are not guaranteed to be reentrant and may modify objects with static or thread storage duration.

@null 好的，我在 system.c 的实现中看到了互斥锁。当我在 dtruss 下运行程序时，我也可以看到互斥体的参与。 @BasileStarynkevitch 我将寻找 system() 内部发生的事情的重新实现。看来有必要直接使用 fork()/exec()/wait() 系统调用。 【参考方案1】：

@BasileStarynkevitch 和@null 指出，Mac OS X 的 C 库中 system() 实现中的全局互斥锁可能是观察到的行为的原因。 @null 提供了对 system() 实现的potential source file 的引用，其中包含这些操作：

#if __DARWIN_UNIX03
    pthread_mutex_lock(&__systemfn_mutex);
#endif /* __DARWIN_UNIX03 */

#if __DARWIN_UNIX03
    pthread_mutex_unlock(&__systemfn_mutex);
#endif /* __DARWIN_UNIX03 */

通过反汇编 lldb 中的 system() 函数，我验证了这些调用确实存在于编译后的代码中。

解决方案是将 system() C 库函数的使用替换为 fork()/execve()/waitpid() 系统调用的组合。在原始示例中修改 worker() 函数的快速概念证明：

static
void *worker(void *t)

    static const char shell[] = "/bin/sh";
    static const char * const args[] =  shell, "-c", "sleep 1", NULL ;
    static const char * const env[] =  NULL ;

    pid_t pid;
    int i = *(int *)t;

    printf("Thread %d started\n", i);

    pid = fork();
    if (pid == 0)
    
        execve(shell, (char **) args, (char **) env);
    
    waitpid(pid, NULL, 0);

    printf("Thread %d ends\n", i);
    return (void *) 0;

通过此修改，现在测试程序在 Mac OS X 上的执行时间大约为一秒。

【讨论】：

以上是关于为啥在线程中使用 system() 时，多线程 C 程序在 Mac OS X 上被强制使用单个 CPU？的主要内容，如果未能解决你的问题，请参考以下文章