通过文件模式在fileS中查找并替换fileS的模式

Posted 2023-03-24

技术标签:

【中文标题】通过文件模式在fileS中查找并替换fileS的模式【英文标题】：Find and replace pattern of fileA in fileC by fileB pattern 【发布时间】：2013-10-22 23:08:49 【问题描述】：

我有两个文件，fileA 有一个名称列表：

AAAAA 
BBBBB
CCCCC
DDDDD

和另一个带有另一个列表的文件B：

还有一个带有一些文本的第三个文件C：

Hello AAAAA toto BBBBB dear "AAAAA" trird BBBBBB tuizf AAAAA dfdsf CCCCC

所以我需要用 fileB 模式查找并替换 fileC 中 fileA 的每个模式。有用！但我意识到 fileC 包含像“AAAAA”这样的词，它不会被“111”替换。

我正在这样做，但它似乎不起作用。

#! /bin/bash
while IFS= read -r lineA && IFS= read -r lineB <&3; do
sed -i -e "s/$lineA/$lineB/g" fileC
done <fileA 3<fileB

【问题讨论】：

所以你的意思是你需要用 111 替换 AAAAA ？ “似乎不起作用。” - 输出是什么？我测试了您的解决方案，它对我有用：Hello 111 toto 222 dear 111 trird 222B tuizf 111 dfdsf 333 也许您只是没有查看您的文件 C ( -i )。 @PeterDev AAAAA in fileC 不会被替换，因为fileA 包含AAAAA 而不是AAAAA（注意尾随空格）。 【参考方案1】：

这是GNU awk 的好工作：

$ cat replace.awk 
FILENAME=="filea" 
    a[FNR]=$0
    next

FILENAME=="fileb" 
    b[a[FNR]]=$0
    next


    for (i=1;i<=NF;i++) 
        printf "%s%s",(b[$i]?b[$i]:$i),(i==NF?RS:FS)

演示：

$ awk -f replace.awk filea fileb filec
Hello 111 toto 222 dear 111 trird BBBBBB tuizf 111 dfdsf 333

sehe的解决方案：

FILENAME==ARGV[1]               # Read the first file passed in
    find[FNR]=$0                 # Create a hash of words to replace
    next                         # Get the next line in the current file

FILENAME==ARGV[2]               # Read the second file passed in
    replace[find[FNR]]=$0        # Hash find words by the words to replace them 
    next                         # Get the next line in the current file

                                # Read any other file passed in (i.e third)
    for (i=1;i<=NF;i++)         # Loop over all field & do replacement if needed
        printf "%s%s",(replace[$i]?replace[$i]:$i),(i==NF?RS:FS)

对于替换忽略单词边界：

$ cat replace.awk 
FILENAME==ARGV[1] 
    find[FNR]=$0
    next

FILENAME==ARGV[2] 
    replace[find[FNR]]=$0
    next


    for (word in find)
        gsub(find[word],replace[find[word]])
    print

演示：

$ awk -f replace.awk filea fileb filec
Hello 111 toto 222 dear "111" trird 222B tuizf 111 dfdsf 333

【讨论】：

我一直很惊讶，经过多年的曝光，awk 设法在我的大脑中留下零持久的印象。我的意思是，它看起来总是适合这项工作的工具，但我真的可以做出正面或反面（FNR？NF，RS，FS？）另外，filea 和 @987654330 @ 在脚本中也被硬编码时仍然在命令行上？只是 - 对我来说很陌生。这对我来说很自然..您可能已经根据记录和字段来描述数据，所以FS 用于field separator，RS 用于record separator 和NF 用于number of fields很理智。您可以在位置上匹配文件并使用argv，但使用名称更具可读性IMO，当然您仍然需要传入每个文件的句柄。我的脚本有效！但我意识到 fileC 包含像“AAAAA”这样的词，它不会被“111”替换。有什么想法吗？ @PeterDev 我添加了不管单词边界或单词边界进行替换的脚本。谢谢，干得不错，但它似乎不适用于“” ~/test# awk -f replace.awk fileA fileB fileC 111 toto 222 亲爱的“AAAAA” trird 222B tuizf 111 dfdsf 333 【参考方案2】：

sed 's/.*/s/' fileA | paste -d/ - fileA fileB | sed 's/$/\//' | sed -f - fileC

正确和更快的版本应该是

paste -d/ fileA fileB | sed 's/^/s\//;s/$/\/g/' | sed -f - fileC

【讨论】：

【参考方案3】：

两相火箭：

sed -e "$(paste file[AB] | sed 's/\(.*\)\t\(.*\)/s\/\1\/\2\/g;/')" fileC

这样做是使用 paste file[AB] | sed 's/\(.*\)\t\(.*\)/s\/\1\/\2\/g;/' 创建一个临时 sed 脚本：

s/AAAAA/111/g;
s/BBBBB/222/g;
s/CCCCC/333/g;
s/DDDDD/444/g;

然后以fileC 作为输入运行它

【讨论】：

@hipe 我没注意到。无论如何，我的版本也是有限的（fileA/fileB 不能包含标签）

以上是关于通过文件模式在fileS中查找并替换fileS的模式的主要内容，如果未能解决你的问题，请参考以下文章