使用管道从 STDIN 读取分叉进程的问题

Posted 2023-02-22

技术标签:

【中文标题】使用管道从 STDIN 读取分叉进程的问题【英文标题】：problem getting forked process to read from STDIN using pipes 【发布时间】：2019-09-23 17:31:45 【问题描述】：

我正在尝试创建一个辅助类来执行系统命令并通过管道支持获得响应。对于我只需要获取响应的情况（命令没有 STDIN 消耗），它按预期工作，对于管道支持，我的 STDIN 出现乱码，我找不到根本原因。

处理这个机制的主要函数是（请忽略小错误检查问题）

最小的工作示例

#include <vector>
#include <string>
#include <iostream>
#include <sstream>
#include <unistd.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <signal.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <stdarg.h>

struct exec_cmd_t 
    exec_cmd_t(std::vector<std::string> args) : args(args), has_executed(false), cpid(-1)  
    exec_cmd_t(const exec_cmd_t &) = delete;
    exec_cmd_t(exec_cmd_t &&) = delete;
    exec_cmd_t & operator=(const exec_cmd_t &) = delete;
    exec_cmd_t & operator=(exec_cmd_t &&) = delete;

    std::string operator()();

    std::string pipe_cmd(const std::string & input);
    std::string pipe_cmd();
    ~exec_cmd_t();

    private:
    std::vector<std::string> args;
    bool has_executed;
    int cpid;
    std::stringstream in_stream;
    std::stringstream out_stream;

    friend std::string operator | (exec_cmd_t & first, exec_cmd_t & second);
    friend std::string operator | (exec_cmd_t && first, exec_cmd_t && second);
    friend std::string operator | (std::string, exec_cmd_t & second);
    friend std::string operator | (std::string, exec_cmd_t && second);
;

std::string exec_cmd_t::pipe_cmd(const std::string & input) 
    this->has_executed = true;
    const int read_end = 0;
    const int write_end = 1;

    int read_pipe[2];
    int write_pipe[2];

    if (pipe(read_pipe) < 0 || pipe(write_pipe) < 0) 
        this->has_executed = false;
        return std::string;
    

    this->in_stream << input;
    std::string line;
    while(getline(this->in_stream, line)) 
        if (line.size() == 0) 
            continue;
        
       int wr_sz = write(write_pipe[write_end], line.c_str(), line.size());
       if (wr_sz <= 0) 
           break;
       
       write(write_pipe[write_end], "\n", 1);
    
    close(write_pipe[write_end]);

    this->cpid = fork();
    if (this->cpid == 0) 
        dup2(write_pipe[read_end], STDIN_FILENO);
        dup2(read_pipe[write_end], STDOUT_FILENO);
        close(read_pipe[read_end]);
        close(write_pipe[write_end]);
        close(read_pipe[write_end]);
        close(write_pipe[read_end]);
        prctl(PR_SET_PDEATHSIG, SIGTERM);
        char * params[args.size()];
        const char * image_path = args[0].c_str();
        for(int i = 1; i < args.size(); i++) 
            params[i-1] = const_cast<char *>(args[i].c_str());
        
        params[args.size()] = nullptr;
        execv(image_path, params);
        exit(1);
    

    close(read_pipe[write_end]);
    close(write_pipe[read_end]);

    char buff[256];
    int rd_sz = -1;
    int flags = fcntl(read_pipe[0], F_GETFL, 0);
    fcntl(read_pipe[read_end], F_SETFL, flags | O_NONBLOCK);
    int status = 0;
    waitpid(this->cpid, &status, 0);
    this->has_executed = false;
    int error_code = 0;
    while((rd_sz = read(read_pipe[read_end], buff, sizeof(buff))) > 0) 
        buff[rd_sz] = '\0';
        this->out_stream << std::stringbuff;
    
    close(read_pipe[read_end]);
    return this->out_stream.str();



std::string exec_cmd_t::pipe_cmd() 
    static std::string empty_str;
    return pipe_cmd(empty_str);


std::string exec_cmd_t::operator()() 
    return pipe_cmd();


exec_cmd_t::~exec_cmd_t() 
    if (this->has_executed) 
        int status;
        waitpid(this->cpid, &status, WNOHANG);
        if (!WIFEXITED(status)) 
            kill(this->cpid, SIGKILL);
            waitpid(this->cpid, &status, 0);
        
    


std::string operator | (exec_cmd_t & first, exec_cmd_t & second) 
    return second.pipe_cmd(first());


std::string operator | (exec_cmd_t && first, exec_cmd_t && second) 
    return second.pipe_cmd(first());


std::string operator | (std::string output, exec_cmd_t & second) 
    return second.pipe_cmd(output);


std::string operator | (std::string output, exec_cmd_t && second) 
    return second.pipe_cmd(output);


int main() 
    auto str = exec_cmd_t "/bin/echo", "echo", "hello\nworld\nor\nnot"  | exec_cmd_t "/bin/grep", "grep", "world", "-" ;
    std::cout << str << std::endl;
    return 0;

给我

grep: =V: No such file or directory                                                                                                                                                   
(standard input):world

似乎 grep 执行了两次，一次失败，没有这样的文件或目录，另一次成功。任何建议都会非常有帮助:-)。提前致谢。

【问题讨论】：

您的代码似乎是 C++，因此您应该删除 C 标记。您的代码不完整。请显示足够的代码，以便我们编译并重现问题。是的，我认为您不应该过滤掉管道中的任何内容：ing。当您在实际代码中使用它并期待空白行时，以后可能会感到困惑。 @TedLyngmo 同意了。谢谢你的建议:-) @TedLyngmo 好点，谢谢我会在测试后更新结果。好的，我打算自己尝试一下您的代码，但它似乎缺少某些部分。你能不能把minimal reproducible example放在一起？看起来很有趣。 【参考方案1】：

你有一个未定义行为的原因，可能导致你的程序做它所做的事情。您声明并使用超出范围的 VLA，如下所示：

char* params[args.size()];
...
params[args.size()] = nullptr;
execv(image_path, params);

这会使您的params 中的终止char* 未初始化，因此它可以指向任何地方。 grep 认为它指向一个文件名，尝试打开它并失败。

由于 VLA:s 不在 C++ 标准中，请考虑将其更改为：

std::vector<char*> params(args.size());
...
params[args.size() - 1] = nullptr;
execv(image_path, params.data());

另一个令人担忧的原因是，您在本应使用 ssize_ts 的地方使用了 ints，即使您阅读或写入的内容极不可能超过 int 可以处理的内容。

在我进行这些更改后，它开始工作并打印出预期的world。我什至添加了第三个命令来检查它是否可以处理它。建议更改：

14,15c14,15
<     exec_cmd_t(std::vector<std::string> args) :
<         args(args), has_executed(false), cpid(-1) 
---
>     exec_cmd_t(std::vector<std::string> Args) :
>         args(Args), has_executed(false), cpid(-1), in_stream, out_stream 
59c59
<         int wr_sz = write(write_pipe[write_end], line.c_str(), line.size());
---
>         ssize_t wr_sz = write(write_pipe[write_end], line.c_str(), line.size());
76c76
<         char* params[args.size()];
---
>         std::vector<char*> params(args.size());
78c78
<         for(int i = 1; i < args.size(); i++) 
---
>         for(decltype(args.size()) i = 1; i < args.size(); i++) 
81,82c81,82
<         params[args.size()] = nullptr;
<         execv(image_path, params);
---
>         params[args.size() - 1] = nullptr;
>         execv(image_path, params.data());
90c90
<     int rd_sz = -1;
---
>     ssize_t rd_sz = -1;
96c96
<     int error_code = 0;
---
>     // int error_code = 0; // unused
106,107c106
<     static std::string empty_str;
<     return pipe_cmd(empty_str);
---
>     return pipe_cmd();
143c142,143
<                exec_cmd_t"/bin/grep", "grep", "world", "-";
---
>                exec_cmd_t"/bin/grep", "grep", "-A1", "hello" |
>                exec_cmd_t"/bin/grep", "grep", "world";

我还意识到您的程序就像管道命令之间的代理，从一个命令读取所有内容并将其写入下一个命令。

您可以同时启动所有程序，并一次在启动的程序之间设置管道。对于三个命令，您需要三个管道：

                     cmd1  cmd2  cmd3
                     |  w--r  w--r  |
                 stdin              read output into program
or fed by your program

如果您决定运行具有大量输出的命令，这将减少性能和内存消耗问题。在内部，您只需要通过读取最后一个命令的输出来存储您想要存储的内容。我对这种方法做了一个小测试，效果很好。

【讨论】：

谢谢，这是 UB（我从来没有想过）。我在错误的地方寻找问题。虽然 VLA 在修复后工作，但我认为你的建议比 VLA 好得多。性能改进建议非常好。再次感谢，祝您度过愉快的一周！

以上是关于使用管道从 STDIN 读取分叉进程的问题的主要内容，如果未能解决你的问题，请参考以下文章

如何使用独立的 stdout、stderr 和 stdin 分叉一个新进程？

在 linux 中，使用 pipe() 从分叉进程调用 system()

为啥在并行子进程之间分叉两次后 pipe() 不工作？

线程和分叉：从 popen()-pipe 读取时 fgetc() 阻塞

C - 同时从两个管道（来自两个子进程的父进程）读取？

管道、分叉和非阻塞 IPC