使用 Bash 逐行读取文件

Posted 2023-03-06

技术标签:

【中文标题】使用 Bash 逐行读取文件【英文标题】：Reading in File line by line w/ Bash 【发布时间】：2018-11-19 19:35:36 【问题描述】：

我正在创建一个 bash 脚本来逐行读取文件，该文件稍后会格式化为按名称和日期组织。我看不出为什么这段代码此时不起作用，尽管我已经尝试使用目录查找器和导出命令自行使用输入和输出文件名变量，但没有出现错误。

export inputfilename="employees.txt"
export outputfilename="branch.txt"
directoryinput=$(find -name $inputfilename)
directoryoutput=$(find -name $outputfilename)
n=1

if [[ -f "$directoryinput" ]]; then
     while read line; do
         echo "$line"
         n=$((n+1))
     done < "$directoryoutput"
 else
    echo "Input file does not exist. Please create a employees.txt file"
 fi

非常感谢所有帮助，谢谢！注意：正如人们注意到的那样，我忘记在数据传输到文件中添加 $ 符号，但这只是在复制我的代码时，我的实际应用程序中确实有 $ 符号，但仍然没有结果

【问题讨论】：

done < "directoryoutput" 应该是 done < "$directoryoutput" （它缺少美元符号）。您可以将这些 shell 脚本扔进 shellcheck.net，它非常适合帮助查找此类问题。在这种情况下，它突出显示第 4 行并指出该变量已加载，但从未使用过。如果您必须使用@987654326，则使用find . -name "$inputfilename" -print -quit 会更安全，保证您只能找到一个文件，而不是在变量中串联多个名称@ 顺便说一句，这里的original indentation 非常令人困惑。为什么do 移回while 循环之外？为什么该循环的内容没有缩进？我冒昧地修改它以遵循代码流；如果过于笨拙，请随意回滚，但请做避免将作为循环一部分的符号突出到该循环之前的缩进级别。（此外，出于可移植性原因，上面建议使用 find . -name 中的 .：能够将其排除在外是一种 GNU 主义，但在 find 的版本中不起作用。接近基线 POSIX 规范）。 【参考方案1】：

使用 Bash 逐行读取文件

逐行读取文件的最佳和惯用方法是：

while IFS= read -r line; do
  // parse line
  printf "%s" "$line"
done < "file"

有关此主题的更多信息，请访问bashfaq

但是不要逐行读取 bash 中的文件。您可以（好吧，几乎）总是不能在 bash 中逐行读取流。在 bash 中逐行读取文件非常慢，不应该这样做。对于简单的情况，可以使用xargs或parallel帮助的所有unix工具，对于更复杂的awk和datamesh，则使用。

done < "directoryoutput"

代码不起作用，因为您正在将名为directoryoutput 的文件的内容作为标准输入的输入传递给while 读取循环。由于这样的文件不存在，您的脚本将失败。

directoryoutput=$(find -name $outputfilename)

可以使用 HERE 字符串构造简单地将变量值 附加换行符附加到 read while 循环中：

done <<< "$directoryoutput"

directoryinput=$(find -name $inputfilename)if [[ -f "$directoryinput" ]]

只要您的目录中只有一个名为$inputfilename 的文件，就可以了。此外，查找文件然后检查它的存在是没有意义的。如果文件更多， find 返回一个换行符分隔的名称列表。不过我觉得小检查if [ "$(printf "$directoryinput" | wc -l)" -eq 1 ] 或使用find -name $inputfilename | head -n1 会更好。

while read line;
   do
      echo "$line"
      n=$((n+1))
  done < "directoryoutput"

这里的意图很清楚。这只是：

 n=$(<directoryoutput wc -l)
 cat "directoryoutput"

除了while read line 删除了尾随和前导换行符并且依赖于 IFS。

除非您有理由不这样做，否则请始终记住引用您的变量。

查看shellcheck，它可以找到脚本中最常见的错误。

我会更喜欢这样：

inputfilename="employees.txt"
outputfilename="branch.txt"

directoryinput=$(find . -name "$inputfilename")
directoryinput_cnt=$(printf "%s\n" "$directoryinput" | wc -l)
if [ "$directoryinput_cnt" -eq 0 ]; then
   echo "Input file does not exist. Please create a '$inputfilename' file" >&2
   exit 1
elif [ "$directoryinput_cnt" -gt 1 ]; then
   echo "Multiple file named '$inputfilename' exists in the current path" >&2
   exit 1
fi

directoryoutput=$(find . -name "$outputfilename")
directoryoutput_cnt=$(printf "%s\n" "$directoryoutput" | wc -l)

if [ "$directoryoutput_cnt" -eq 0 ]; then 
    echo "Input file does not exist. Please create a '$outputfilename' file" >&2
    exit 1
elif [ "$directoryoutput_cnt" -gt 1 ]; then 
   echo "Multiple file named '$outputfilename' exists in the current path" >&2
    exit 1
fi

cat "$directoryoutput"
n=$(<"$directoryoutput" wc -l)

【讨论】：

printf "$directoryinput" 最好写成printf '%s\n' "$directoryinput"，因此您不会将文件名（具有任意内容）视为格式字符串。（此外，在 UNIX 上正确终止一行内容需要尾随换行符；POSIX explicitly directs wc -l to count the number of newlines，因此忽略没有尾随换行符的内容）。虽然文件存在，但代码不相信它并返回“输入文件不存在。请创建一个'$inputfilename'文件”。我还尝试在 inputfilename 和 outputfilename 变量的初始化上加上美元符号。其余的看起来不错，谢谢！ @SamThompson 你是对的，我打错了。在计算结果的数量时，我缺少一个尾随换行符，所以我们需要按照 Charles 的建议进行操作，printf "%s\n" "$...." | wc -l。没有\n 它计数为零，即使存在一个文件：/ 我忘记了命令替换$(...) 删除尾随换行符。我已经编辑了答案。或者，您可以使用 bash $(<<<"$..." wc -l)，这将导致相同的结果。

以上是关于使用 Bash 逐行读取文件的主要内容，如果未能解决你的问题，请参考以下文章