如何使 sed 避免特定符号后的替换

Posted 2023-03-15

技术标签:

【中文标题】如何使 sed 避免特定符号后的替换【英文标题】：How to make sed avoid replacement after specific symbol 【发布时间】：2021-04-30 10:36:25 【问题描述】：

我正在编写一个用于格式化 Fortran 源代码的脚本。简单的格式，比如所有关键字都是大写或小写等。这是主要命令

sed -i -e "/^\!/! s/$small\s/$cap /gI" $filein

它将每个关键字 $small（后跟一个空格）替换为关键字 $caps。并且仅当该行不以“！”开头时才会发生替换。它做它应该做的。问题：

如果“！”，如何避免替换在一行的中间遇到。或者更一般地说，如何替换所有地方的模式，但不是在特定符号之后，它可以在行首或其他地方。

例子：

Program test  ! It should not change the next program to caps
! Hi, is anything changing here? like program ?
This line does not have any key words
This line has Program and not exclamation mark.

“程序”是一个关键字。运行脚本后的结果是：

PROGRAM test  ! It should not change the next PROGRAM to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.

我想要：

PROGRAM test  ! It should not change the next program to caps
! Hi, is anything changed here? like program ?
This line does not have any key words
This line has PROGRAM and not exclamation mark.

到目前为止，我还没有找到一个很好的解决方案，希望可以使用 sed 命令来解决问题。

【问题讨论】：

听起来像XY Problem；我假设更大的图景会看到很多很多这些类型的更改，在这种情况下，我想知道花一些时间在顶层设计上是否有意义，例如语法（重新）解析器，或者一个脚本（awk?perl?）来处理这些变化的库......而不是目前看起来像100个单独的sed调用......？ GNU awk 怎么样？试试awk -i inplace -v small="$small" -v cap="$cap" 'BEGINIGNORECASE=1;FS=OFS="!" gsub(small, cap, $1)1' $filein，见ideone.com/ugd5JE 那么，ideone.com/lM4fZC 对你有用吗？注： Program 和 program 不一样，那么 $small 变量是否代表正则表达式？ 【参考方案1】：

sed 中的典型方式是：

将字符串分成两部分 - 将一部分保存在保留空间中。对模式空间进行操作获取保持空间并随机播放输出。

会有一些东西：

sed '/!/!b;/[^!]/b;h;s/.*!//;x;s/!.*//;s/program/PROGRAM/gI;G;s/\n/!/'

/!/!b; - 如果该行没有 !，则打印并重新开始。 h;s/.*!//;x;s/!.*// - 在保持空间中将部分放在 ! 之后，在模式空间中将部分放在 ! 之前 s/program/PROGRAM/gI; - 对部分字符串进行替换 G;s/\n/!/ - 从保持空间中抓取零件并随机输出 - 在这里很容易。

【讨论】：

在我看来 OP 只需要在 first ! 之前替换，然后 sed "h;s/^[^\!]*\!//;x;s/\!.*//;s/$small/$cap/gI;G;s/\n/\!/" 应该可以工作。顺便说一句，这个解释真的很有帮助。是的，我在其他地方看到了类似的将字符串分成两部分的情况，所以也许这是解决我的问题的唯一方法。不幸的是，这个特定的命令对我不起作用。它只是冻结并且没有写入输出文件。它也不显示任何错误消息。我猜想缺少一些小东西，例如空格或其他东西。 @MikhailModestov 见ideone.com/HT1QiM，它似乎有效。好的，两种方法我都试过了。事实上，它适用于这两条特定的线路。但是如果一行没有“！”它添加了“！”并复制该行中的文本。看来我忘了在问题中提及它。因此，如果一行没有任何关键字，它必须保持不变。很抱歉造成混淆，没有更详细地描述它。【参考方案2】：

假设：

OP 需要将多个关键字转换为大写要大写的关键字不包括空格（例如，program name 需要作为两个单独的字符串 program 和 name 处理）输入分隔符是空格带有“附加”非字母数字的关键字将被忽略（例如，Program, 将被忽略，因为, 将作为字符串的一部分被拾取）除非 OP 明确包含非字母数字作为关键字定义的一部分（例如，关键字包括Program,）所有关键字都被转换为大写（即，不用担心任何标志在小写、大写、驼峰等之间切换）

示例输入数据：

$ cat source.txt
Program test  ! It should not change the next program to caps        # change first 'Program'
! Hi, is anything changing here? like program or MarK?               # change nothing
This line does not have any key words ! except here - pRoGraM Mark   # change nothing
This line has Program and not exclamation mARk plus MarKer.          # change 'Program' and 'mARk' but not MarKer
Hi, hi, hI                                                           # change 'Hi,' and 'hi,' but not 'hI'

在单独文件中提供的关键字列表（空格分隔）；

$ cat keywords.dat
program
mark hi,                  # 2 separate keywords: 'mark' and 'hi,' (comma included)

一个awk想法：

awk -v comment="!" '                                # define character after which conversions are to be ignored
FNR==NR  for ( i=1; i<=NF; i++)                    # first file contains keywords; process each field as a separate keywork
              keywords[toupper($i)]                 # convert to uppercase and use as index in associative array keywords[]
          next
        
         for ( i=1; i<=NF; i++ )                   # second file, process each field separately
               if ( $i == comment )                # if field is our comment character then stop processing rest of line else ...
                   break
                if ( toupper($i) in keywords )      # if current field is a keyword then convert to uppercase
                   $i=toupper($i)
              
          print                                     # print the current line
        
' keywords.dat source.txt

这会生成：

PROGRAM test ! It should not change the next program to caps
! Hi, is anything changing here? like program or MarK?
This line does not have any key words ! except here - pRoGraM Mark
This line has PROGRAM and not exclamation MARK plus MarKer.
HI, HI, hI

注意事项：

虽然GNU awk 可以被告知覆盖输入文件（例如，awk -i inplace == sed -i），但这将需要一种不同的方法来处理keywords.dat 文件（以防止被任何内容覆盖）（相当多）可以添加额外的逻辑以支持大写、小写、驼峰式和其他任何...在比较中忽略或包含非字母数字...使用多个/不同的“注释”字符...标准化其他(Fortran) 代码的一部分（例如，缩进）...等

【讨论】：

【参考方案3】：

这可能对你有用（GNU sed）：

small='Program ' caps='PROGRAM '
sed -E ':a;s/^([^!]*)('"$small"')/\1\n/;ta;s/\n/'"$caps"'/g' file

用换行符替换符号!之前出现的变量$small，然后用变量$caps替换所有换行符。

注意选择换行符是因为它通常不能存在于 sed 呈现的任何行中，因为它是 sed 用于在模式空间中呈现行的分隔符。其次，匹配$small 的单词被迭代替换为换行符，然后所有换行符全局替换为$caps。这允许替换为第一个的超集。如果不是这样的操作顺序，迭代过程可能会变成死循环。

如果$small 表示不区分大小写的匹配，请将i 标志添加到第一个替换。

【讨论】：

【参考方案4】：

我已经尝试了建议的选项，但对于整个文件，它们都没有按预期工作。我最终得到了多个 sed 命令；我确信这不是最好的解决方案，但它对我有用并且可以满足我的需要。我的主要问题是避免在“！”之后更换如果它出现在行中间的某个地方。所以我把这个问题换成了我能处理的问题。

 sed -i -e "/^\!/! s/!/!c7u!!c7u!/" $filein               # 1. If a line does NOT start with !, search next "!" and replace it with "!c7u!!c7u!"
  sed -i "s/!c7u!/\n/" $filein                             # 2. Move that comment to a new line
    for ((i=0; i<$nwords; i++ )); do                       # Loop through all keywords
      word=$words[$i]                                    # Take a keyword from the list
      small=$word,,                                      # Write it in small letters
      cap=$word^^                                        # Write it in capitals
      sed -i -e "/^\!/! s/$small\b/$cap/gI" $filein        # 3. Actual replacement in lines not starting with "!"
    done

    sed -i -e :a -e '$!N;s/\n!c7u//;ta' -e 'P;D' $filein   # 4. Undo step 1-2, moving inline comments back

【讨论】：

以上是关于如何使 sed 避免特定符号后的替换的主要内容，如果未能解决你的问题，请参考以下文章

linux sed命令替换特定字符之后的字符

sed 变量替换和Linux的特殊符号大全