为啥子进程中的管道有时会中断,有时不会?

Posted

技术标签:

【中文标题】为啥子进程中的管道有时会中断,有时不会?【英文标题】:Why do Pipes from Child Processes break sometimes and sometimes not?为什么子进程中的管道有时会中断,有时不会? 【发布时间】:2018-05-04 14:41:32 【问题描述】:

我正在使用 Perl 开发一个带有主脚本的系统,该主脚本需要执行几个不同的工作脚本,这些脚本同时在系统上执行不同的任务。 这些脚本也不需要用 Perl 编写。它们可以是任何可在命令行上执行并通过STDOUTSTDERREXITCODE 返回结果的程序。

所以我做了一个小示例脚本hello.pl:

#!/usr/bin/perl

print STDERR "HELLO ERROR 0 !\n";

print "hello script!!\n";

print "hello waiting 3 sec ...\n";

sleep 3;

print "hello list:\n";
print "cmd: 'ls -lah'\n";

print `ls -lah`;

print "res: '$?'\n";    

print "hello waiting 2 sec ...\n";

sleep 2;    

print STDERR "HELLO ERROR 1 !\n";

主脚本需要在一侧捕获STDOUT 的结果,在另一侧捕获STDERR 中的错误通知。

主脚本执行fork 以非阻塞模式执行工作脚本,并使用waitpidselect 监控工作脚本进度。

所以master.pl 脚本的代码是这样的:

#!/usr/bin/perl

use strict;
use warnings;

use IO::Select;
use POSIX ":sys_wait_h";


sub Launch

  my $rprocess = shift;
  my $irs = 0;


  local *logreader;
  local *errorreader;
  my $logwriter   = undef;
  my $errorwriter = undef;

  pipe( *logreader,   $logwriter );
  pipe( *errorreader, $errorwriter );

  #Spawn the Child Process
  my $iprcpid = fork();

  #Check the Success of Process Forking
  if(defined $iprcpid)
             
    #------------------------
    #Sub Process Launch succeeded

    # Check whether parent/child process
    if($iprcpid > 0)
    
      #------------------------
      #Parent Process

      close($logwriter);
      close($errorwriter);

      $rprocess->"pid" = $iprcpid;
      $rprocess->"log_pipe" = *logreader;
      $rprocess->"error_pipe" = *errorreader;

      $irs = 1;
    
    elsif($iprcpid == 0)
    
      #------------------------
      #Child Process

      my $ierr = 0;

      close(*logreader);
      close(*errorreader);

      open( STDOUT, ">&=", $logwriter );
      open( STDERR, ">&=", $errorwriter );


      #------------------------
      #Execute the configured Command

      print "cmd: '" . $rprocess->"file" . "'\n";

      print "cmd rng ...\n";

      print `$rprocess->"file"`; 

      $ierr = $?;

      print "closing transmission ...\n";   

      close STDOUT;
      close STDERR;

      exit $ierr; 
    
    else    #An Error has ocurred in the Sub Process Launch
    
        # Unable to fork
        print "ERROR: Sub Process '" . $rprocess->"name" . "' Launch failed: $!\n";
       #if($iprcpid > 0)
  
  else    #An Error has ocurred in the Process Spawning   
  
    # Unable to fork
    print "ERROR: Sub Process '" . $rprocess->"name" . "' Launch failed: $!\n";
    #if(defined $iprcpid)

  return $irs;


sub Read

  my $rprocess = shift;

  my $ppsel = $rprocess->"select";
  my $pplog = $rprocess->"log_pipe";
  my $pperr = $rprocess->"error_pipe";
  my @arrppselrdy = undef;
  my $pprdy = undef;
  my $srdln = "";
  my $irdcnt = 0;

  unless(defined $ppsel)
  
    $ppsel = IO::Select->new();

    $ppsel->add($pplog);
    $ppsel->add($pperr);

    #Store the Selector Object
    $rprocess->"select" = $ppsel;
  
  else  #The Selector was already created
  
    $ppsel = $rprocess->"select";
    #unless(defined $ppsel)    

  while(@arrppselrdy = $ppsel->can_read(1))
  
    foreach $pprdy (@arrppselrdy)
    
      $irdcnt = sysread($pprdy, $srdln, 8192);

      if(defined $irdcnt)
      
        if($irdcnt > 0)
        
          if(fileno($pprdy) == fileno($pplog))
          
            $rprocess->"log" .= "pipe (" . fileno($pprdy) . "): reading report ...\n";

            $rprocess->"log" .= $srdln;
          
          elsif(fileno($pprdy) == fileno($pperr))
          
            $rprocess->"log" .= "pipe (" . fileno($pprdy) . "): reading error ...\n";

            $rprocess->"error" .= $srdln;
             #if(fileno($pprdy) == fileno($pplog))
        
        else    #End of Transmission
        
          $rprocess->"log" .= "pipe (" . fileno($pprdy) . "): transmission done.\n";

          #Remove the Pipe File Handle
          $ppsel->remove($pprdy);  

         #if($irdcnt > 0)
      
      else  #Reading from the Pipe failed
      
        #Remove the Pipe File Handle
        $ppsel->remove($pprdy);

        if($!)
        
          $rprocess->"error" .= "ERROR: Sub Process " . $rprocess->"name" 
            . ": Read failed with [" . ($! * 1) . "]!\n"
            . "Message: '$!'\n";
          #if($!)
        #if(defined $irdcnt)
      #foreach $pprdy (@arrppselrdy)
   #while(@arrppselrdy = $ppsel->can_read(1))

  $rprocess->"log" .= "try read done. '" . $ppsel->count . "' pipes left.\n";


sub Check

  my $rprocess = shift;
  my $irng = 0;

  if($rprocess->"pid" > 0
    && $rprocess->"status" < 0)
  
    my $ifinishpid = waitpid($rprocess->"pid", WNOHANG);

    if($ifinishpid > -1)
    
      if($ifinishpid == 0)
      
        $irng = 1;
      
      else
      
        $rprocess->"status" = $?;
      

      #Read the Message Pipes
      Read($rprocess);

      if($ifinishpid > 0)
      
        close $rprocess->"log_pipe";
        close $rprocess->"error_pipe";
      
    
    else  #Sub Process does not exist
    
      $rprocess->"error" .= "ERROR: Sub Process does not exist!\n";
      #if($ifinishpid > 0)
    #if($rprocess->"pid" > 0 && $rprocess->"status" < 0)

  return $irng;




my %hshprocesses = ("hello1" => "name" => "hello1", "pid" => -1, "file" => "./hello.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1
  , "hello2" => "name" => "hello2", "pid" => -1, "file" => "sleep 3 ; ./hello2.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1
  , "hello3" => "name" => "hello3", "pid" => -1, "file" => "./hello3.pl ; sleep 2"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1
  , "hello4" => "name" => "hello4", "pid" => -1, "file" => "./hello4.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1);


print "prcs launching ...\n";

foreach (keys %hshprocesses)

  unless(Launch($hshprocesses$_))
  
    print "prc '" . $hshprocesses$_"name" . "': Launch failed!\n";
  
  #foreach (keys %hshprocesses)

print "prcs launched.\n";

my $irunningcount = 0;

do

  $irunningcount = 0;

  foreach (keys %hshprocesses)
  
    $irunningcount++ if(Check($hshprocesses$_));
    #foreach (keys %hshprocesses)

  print "prc rng cnt: '$irunningcount'\n";

while($irunningcount > 0);

foreach (keys %hshprocesses)

  print "process (" . $hshprocesses$_"pid" . ") '" . $hshprocesses$_"name" . "':\n";
  print "status [" . $hshprocesses$_"status" . "]\n";
  print "log:\n"
    . "'" . $hshprocesses$_"log" . "'\n";
  print "error:\n"
    . "'" . $hshprocesses$_"error" . "'\n";
  #foreach (keys %hshprocesses)

所以阅读logreadererrorreader 我想我能够捕获整个输出。

预期的报告会是这样的:

$ ./master.pl
prcs launching ...
prcs launched.
prc rng cnt: '4'
prc rng cnt: '3'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '0'
process (2742) 'hello4':
status [0]
log:
'pipe (3): reading report ...
cmd: './hello4.pl'
cmd rng ...
pipe (5): reading error ...
try read done. '2' pipes left.
pipe (5): reading error ...
pipe (3): reading report ...
hello4 script!!
hello waiting 3 sec ...
hello4 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
pipe (3): reading report ...
closing transmission ...
pipe (5): transmission done.
pipe (3): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO4 ERROR 0 !
HELLO4 ERROR 1 !
'
process (2743) 'hello1':
status [0]
log:
'pipe (4): reading report ...
cmd: 'sleep 3 ; ./hello.pl'
cmd rng ...
try read done. '2' pipes left.
pipe (7): reading error ...
try read done. '2' pipes left.
try read done. '2' pipes left.
pipe (7): reading error ...
pipe (4): reading report ...
hello script!!
hello waiting 3 sec ...
hello list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (7): transmission done.
pipe (4): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO ERROR 0 !
HELLO ERROR 1 !
'
process (2745) 'hello3':
status [0]
log:
'pipe (6): reading report ...
cmd: './hello3.pl ; sleep 2'
cmd rng ...
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (6): reading report ...
hello3 script!!
hello waiting 3 sec ...
hello3 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (9): transmission done.
pipe (6): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO3 ERROR 0 !
HELLO3 ERROR 1 !
'
process (2746) 'hello2':
status [0]
log:
'pipe (8): reading report ...
cmd: './hello2.pl'
cmd rng ...
pipe (11): reading error ...
try read done. '2' pipes left.
pipe (8): reading report ...
hello2 script!!
hello waiting 3 sec ...
hello2 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (11): reading error ...
pipe (8): transmission done.
pipe (11): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO2 ERROR 0 !
HELLO2 ERROR 1 !
'

但这只有在只有 1 个工作脚本时才有效。 如果主脚本尝试读取多个 Worker 脚本,则此传输管道在执行 Worker 脚本hello.pl 之前由 Child 关闭。 最后读到的是:

        cmd: './hello.pl'
        cmd rng ...

主脚本使用 IO::Select 模块通过选择器对象从 logreadererrorreader 读取。 这样我在进度日志中找到:

        cmd: './hello.pl'
        cmd rng ...
        pipe (8): transmission done.

但是来自hello.pl 的所有东西都丢失了。

有时我仍然在错误报告中找到hello.pl STDERR 的第一行

HELLO ERROR 0 !

为什么管道有时会破裂,有时不会? 我怎样才能防止这种情况发生?

现在使用此测试脚本,我无法重现 Broken Pipe 的错误。 但即使在现实世界的应用程序中,它有时会发生,有时不会。 我在一个带有 perl v5.10 的系统上开发了这个应用程序,并且从 4 个进程 3 个运行和 1 个中断。尤其是当它调用sleep 时。

在什么情况下管道会破裂?


编辑 2018-05-05: 我构建了主脚本 master.pl 并使用 perl v5.24.1 运行它

【问题讨论】:

你必须在你 “阅读几个 Worker Scripts [sic]”的地方显示代码 我将所有logreadererrorreader 管道与IO::Select 对象一起存储在HASH 中,以便在它们上执行IO::Select::can_read() 你为什么到处使用全局变量?你不使用use strict; use warnings;吗? 我确实使用use strict; use warnings;。是的,管道对于主脚本master.pl 来说是全局的,以便回到他们那里进行阅读。 请提供一个最小的、可运行的问题演示。 (见minimal reproducible example。) 【参考方案1】:

由于没有人可以在 1 个月内给我任何有用的答案甚至提示,我将发布我对 断管现象 的发现,以帮助可能面临类似问题的其他人.


我终于可以追踪到断管的现象了。 我添加了足够长的睡眠时间,让我可以使用 strace 命令跟踪系统活动,它实际上记录了 Broken Pipe 错误:

$ strace -p 11729
Process 11729 attached
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, "hello script!!\nhello waiting 20 "..., 67) = 67
pipe([3, 4])                            = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe670d5f9d0) = 13238
close(6)                                = 0
close(4)                                = 0
read(5, "", 4)                          = 0
close(5)                                = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff334b57b0) = -1 EINVAL (Invalid argument)
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "total 228K\ndrwxr-xr-x.  4 usr15 "..., 4096) = 1868
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigaction(SIGHUP, SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0, SIG_DFL, [], 0, 8) = 0
rt_sigaction(SIGINT, SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0, SIG_DFL, [], 0, 8) = 0
rt_sigaction(SIGQUIT, SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0, SIG_DFL, [], 0, 8) = 0
wait4(13238, [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0, NULL) = 13238
--- SIGCHLD si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13238, si_status=0, si_utime=0, si_stime=0 ---
rt_sigaction(SIGHUP, SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0, NULL, 8) = 0
rt_sigaction(SIGINT, SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0, NULL, 8) = 0
rt_sigaction(SIGQUIT, SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, SIG_DFL, [], 0, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep(30, 0, 0x7fff334b5b90)      = 0
write(2, "HELLO ERROR 1 !\n", 16)       = -1 EPIPE (Broken pipe)
--- SIGPIPE si_signo=SIGPIPE, si_code=SI_USER, si_pid=11729, si_uid=501 ---
+++ killed by SIGPIPE +++

它记录了hello.pl 正确完成并结束写入错误消息write(2, "HELLO ERROR 1 !\n", 16) = -1 但它发现 Pipe 坏了,导致错误 EPIPE (Broken pipe) 导致它与 +++ killed by SIGPIPE +++ 崩溃

那么问题的答案在什么情况下管道会破裂?

如果分叉的子进程已经终止

但为什么在管道损坏之前我无法读取输出?

剪切输出现象

    cmd: './hello.pl'
    cmd rng ...
    pipe (8): transmission done.
print() 在分叉子进程中组合的结果 和父进程中的sysread()。 输出是 实际上在 Child 端缓冲并最终写入 Parent 一边。

为什么要在收到输出之前终止子进程?

这来自实际中与 Objecthood 的结合 实现Object::DESTROY 的真实世界应用程序 方法。 此方法包括一个kill() 清理逻辑,用于仍然 运行理论上不应触发的子进程。 但在现实世界的应用程序中,进程不是即时的 阅读并收获了导致他们被杀死的原因 Object::DESTROY Logic。但我在执行日志中找不到这方面的内容,因为它的 Object 已被 Perl 的垃圾收集器销毁。

因此,一些进程被杀死,而另一些则没有。有时他们会被杀死,但有时不会。

【讨论】:

如果您不喜欢我的回答,我很抱歉,但事实就是如此。现在我显然知道了答案,我可以更改我的问题以适应答案或提供另一个测试脚本来演示现象。但是因为我现在已经知道答案了,所以我不需要再问这个问题了。是的,现实世界的问题就是这样。【参考方案2】:

我认为你必须使用wait 而不是waitpidwait 等待任何孩子,而 waitpid 只等待一个特定的孩子。

【讨论】:

我观察到的现象是sysread() 返回0。因此master.pl 脚本在hello.pl 脚本开始执行系统命令时写入pipe (8): transmission done.。预期的行为是 IO::Select::can_read() 将在 hello.pl 脚本处于休眠状态时超时。 根据测试脚本,当my $ifinishpid = waitpid($rprocess-&gt;"pid", WNOHANG);$ifinishpid 中返回另一个值时,您的Check() 子例程将如何显示 另外官方 Linux 系统文档根据man7.org/linux/man-pages/man2/waitpid.2.html 解释了“wait() 和 waitpid():wait() 系统调用暂停调用线程的执行,直到一个其子级终止。调用wait(&amp;wstatus) 等效于waitpid(-1, &amp;wstatus, 0);“系统行为没有区别

以上是关于为啥子进程中的管道有时会中断,有时不会?的主要内容,如果未能解决你的问题,请参考以下文章

linux vfork的子程序与父进程共享内存,那为啥子进程执行exec就不会覆盖父进程呢?

linux vfork的子程序与父进程共享内存,那为啥子进程执行exec就不会覆盖父进程呢?

为啥线程在进程间通信期间会破坏命名管道?

在多个进程写入时读取命名管道

需要与子进程通信来处理文件

进程在写入管道之前读取数据