可能的竞争条件，来自多个 tee 接收者的管道输出在 BASH 脚本中的命名管道上无序到达

Posted 2023-02-16

技术标签:

【中文标题】可能的竞争条件，来自多个 tee 接收者的管道输出在 BASH 脚本中的命名管道上无序到达【英文标题】：Possible race condition with piped output from multiple tee recipients arriving out-of-sequence on a named pipe in a BASH script 【发布时间】：2012-03-09 23:52:20 【问题描述】：

更新：虽然实际上并没有解决与我的管道工作有关的原始问题，但我通过大大简化它解决了我的问题，只是完全放弃了管道。这是一个概念验证脚本，它在从磁盘读取一次的同时并行生成 CRC32、MD5、SHA1、SHA224、SHA256、SHA384 和 SHA512 校验和，并将它们作为 JSON 对象返回（将使用 php 中的输出和红宝石）。没有错误检查很粗糙，但它可以工作：

#!/bin/bash

checksums="`tee <"$1" \
        >( cfv -C -q -t sfv -f - - | tail -n 1 | sed -e 's/^.* \([a-fA-F0-9]\8\\)$/"crc32":"\1"/' ) \
        >( md5sum - | sed -e 's/^\([a-fA-F0-9]\32\\) .*$/"md5":"\1"/' ) \
        >( sha1sum - | sed -e 's/^\([a-fA-F0-9]\40\\) .*$/"sha1":"\1"/' ) \
        >( sha224sum - | sed -e 's/^\([a-fA-F0-9]\56\\) .*$/"sha224":"\1"/' ) \
        >( sha256sum - | sed -e 's/^\([a-fA-F0-9]\64\\) .*$/"sha256":"\1"/' ) \
        >( sha384sum - | sed -e 's/^\([a-fA-F0-9]\96\\) .*$/"sha384":"\1"/' ) \
        >( sha512sum - | sed -e 's/^\([a-fA-F0-9]\128\\) .*$/"sha512":"\1"/') \
        >/dev/null`\ 
"

json=""

for checksum in $checksums; do json="$json$checksum,"; done

echo "$json:0: -1"

原始问题：

我有点不敢问这个问题，因为我的搜索词组点击率很高，在应用从 Using named pipes with bash - Problem with data loss 获得的知识并阅读另外 20 页后，我仍然有点就此打住。

因此，为了继续，我正在编写一个简单的脚本，使我能够在文件上同时创建 CRC32、MD5 和 SHA1 校验和，而只从磁盘读取一次。为此，我正在使用 cfv。

最初，我只是编写了一个简单的脚本，该脚本将三个 cfv 命令写入 /tmp/ 下的三个单独文件，然后将它们写入标准输出，但最终得到了空输出，除非我在尝试读取文件之前让脚本休眠一秒钟。觉得这很奇怪，我认为我在编写脚本时是个白痴，所以我尝试采用不同的方法，让 cfv 工作人员输出到命名管道。到目前为止，这是我的脚本，在应用了上述链接中的技术之后：

!/bin/bash

# Bail out if argument isn't a file:
[ ! -f "$1" ] && echo "'$1' is not a file!" && exit 1

# Choose a name for a pipe to stuff with CFV output:
pipe="/tmp/pipe.chksms"

# Don't leave an orphaned pipe on exiting or being terminated:
trap "rm -f $pipe; exit" EXIT TERM

# Create the pipe (except if it already exists (e.g. SIGKILL'ed b4)):
[ -p "$pipe" ] || mkfifo $pipe

# Start a background process that reads from the pipe and echoes what it
# receives to stdout (notice the pipe is attached last, at done):
while true; do
        while read line; do
                [ "$line" = "EOP" ] && echo "quitting now" && exit 0
                echo "$line"
        done
done <$pipe 3>$pipe & # This 3> business is to make sure there's always
                      # at least one producer attached to the pipe (the
                      # consumer loop itself) until we're done.

# This sort of works without "hacks", but tail errors out when the pipe is
# killed, naturally, and script seems to "hang" until I press enter after,
# which I believe is actually EOF to tail, so it's no solution anyway:
#tail -f $pipe &

tee <"$1" >( cfv -C -t sfv -f - - >$pipe ) >( cfv -C -t sha1 -f - - >$pipe ) >( cfv -C -t md5 -f - - >$pipe ) >/dev/null

#sleep 1s
echo "EOP" >$pipe
exit

所以，按原样执行，我得到这个输出：

daniel@lnxsrv:~/tisso$ ./multisfv file
 :  :  : quitting now
- : Broken pipe (CF)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
- : Broken pipe (CF)
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
- : Broken pipe (CF)
daniel@lnxsrv:~/tisso$ close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr

但是，注释掉 sleep 1s 后，我得到了预期的输出，

daniel@lnxsrv:~/tisso$ ./multisfv file
3bc1b5ff125e03fb35491e7d67014a3e *
-: 1 files, 1 OK.  0.013 seconds, 79311.7K/s
5e3bb0e3ec410a8d8e14fef1a6daababfc48c7ce *
-: 1 files, 1 OK.  0.016 seconds, 62455.0K/s
; Generated by cfv v1.18.3 on 2012-03-09 at 23:45.23
;
2a0feb38
-: 1 files, 1 OK.  0.051 seconds, 20012.9K/s
quitting now

这让我感到困惑，因为我假设 tee 直到每个 cfv 接收者退出后才会退出，因此 echo "EOP" 语句将执行直到所有 cfv 子流完成，这意味着他们会将他们的输出写入我的命名管道...然后执行 echo 语句。

由于没有管道的行为是相同的，只使用输出临时文件，我认为这一定是一些与 tee 将数据推送到其接收者的方式有关的竞争条件？我尝试了一个简单的“等待”命令，但它当然会等待我的 bash 子进程 - while 循环 - 完成，所以我只是得到一个挂起的进程。

有什么想法吗？

TIA, 丹尼尔:)

【问题讨论】：

我希望这些校验和有可用的源代码。如何将它们组合成 1 个程序，并将您处理的 3 个值写入适当的校验和文件。我不得不相信 perl 可能有这方面的模块，同样，你可以一起做 1 次文件传递。（只是想出这个框框，YRMV）。祝你好运！这会有帮助吗？ parallel --group 'cfv -C -t sfv -f - ;cfv -C -t sha1 -f - ;cfv -C -t md5 -f - ;' ::: file @shelter - 我想编写自己的例程始终是我的后备方案，但我更愿意尽可能多地使用已经可用的工具。 @potong - Ubuntu 提供给我的并行命令 - 来自 moreutils 包 - 不接受该命令。它的手册页没有提到任何 --group 参数？另外，阅读它的手册页，它似乎并不是为了解决我的问题。似乎它的目的更多是在子shell中分发命令以实现粗略的CPU平衡，即使我使用它，每个子shell也会从源文件中读取，这是我想要避免的。我指的是 GNU 并行。但如果它没有帮助......它没有帮助。 【参考方案1】：

一旦将输入的最后一位写入最后一个输出管道并关闭它（即由 bash 创建的未命名管道，而不是您的 fifo，又名“命名管道”），tee 就会退出。它不需要等待读取管道的进程完成；事实上，它甚至不知道它正在写入管道。由于管道具有缓冲区，因此 tee 很可能在另一端的进程完成读取之前完成写入。所以脚本会将'EOP'写入fifo，导致读取循环终止。这将关闭 fifo 的唯一读取器，并且所有 cfv 进程在下次尝试写入 stdout 时将获得 SIGPIPE。

这里要问的一个明显问题是，为什么不只运行三个（或 N 个）独立进程来读取文件并计算不同的摘要。如果“文件”实际上是动态生成的或从某个远程站点下载的，或者其他一些缓慢的过程，那么按照您尝试的方式做事可能是有意义的，但如果该文件存在于本地磁盘，很可能实际上只会发生一次磁盘访问；滞后的摘要器将从缓冲区缓存中读取文件。如果这就是您所需要的，GNU 并行应该可以正常工作，或者您可以在 bash（使用 &）中启动进程，然后等待它们。 YMMV，但我认为这些解决方案中的任何一个都比设置所有这些管道并使用 tee 模拟用户空间中的缓冲区缓存更节省资源。

顺便说一句，如果你想序列化多个进程的输出，你可以使用flock实用程序。仅使用 fifo 是不够的；无法保证写入 fifo 的进程会自动写入整行，如果你知道他们这样做了，你就不需要 fifo。

【讨论】：

以上是关于可能的竞争条件，来自多个 tee 接收者的管道输出在 BASH 脚本中的命名管道上无序到达的主要内容，如果未能解决你的问题，请参考以下文章