sed进阶

Posted 2022-01-05 tianmu
tags:
篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了sed进阶相关的知识，希望对你有一定的参考价值。
#多行命令
    需要对跨多行的数据执行特定操作。如一条短语分布在两行中。
    三个可用来处理多行文本的特殊命令：
        N ：将数据流中的下一行加进来创建一个多行组（multiline group）来处理。
        D ：删除多行组中的一行。
        P ：打印多行组中的一行。
        next 命令
            1. 单行的 next 命令： n 命令会让sed编辑器移动到文本的下一行。
                $ cat data1.txt
                This is the header line.

                This is a data line.

                This is the last line.
                $                                
                $ sed ‘/header/n ; d‘ data1.txt    -- 找到header会删除模式匹配成功的下一行
                This is the header line.
                This is a data line.
                
                This is the last line.
                $
            2. 合并文本行：多行版本的 next 命令（用大写N）会将下一文本行添加到模式空间中已有的文本后，作为一行处理。
                $ cat data4.txt
                On Tuesday, the Linux System
                Administrator‘s group meeting will be held.
                All System Administrators should attend.
                $
                $ sed ‘
                > s/System Administrator/Desktop User/    -- 将单行命令放前面可以解决最后一行找不到下一行合并的情况
                > N
                > s/System\nAdministrator/Desktop\nUser/
                > ‘ data4.txt
                On Tuesday, the Linux Desktop
                User‘s group meeting will be held.
                All Desktop Users should attend.
                $
        多行删除命令
            sed编辑器提供了多行删除命令 D ，它只删除模式空间中的第一行。该命令会删除到换行符（含换行符）为止的所有字符
                $ sed ‘N ; /System\nAdministrator/D‘ data4.txt
                Administrator‘s group meeting will be held.
                All System Administrators should attend.
                $ cat data5.txt

                This is the header line.
                This is a data line.

                This is the last line.
                $
                $ sed ‘/^$/N ; /header/D‘ data5.txt    -- 删除数据流中出现在第一行前的空白行
                This is the header line.
                This is a data line.

                This is the last line.
                $
        多行打印命令
            多行打印命令（ P ）沿用了同样的方法。它只打印多行模式空间中的第一行
                $ cat data3.txt
                On Tuesday, the Linux System
                Administrator‘s group meeting will be held.
                All System Administrators should attend.
                Thank you for your attendance.
                $
                $ sed -n ‘N ; /System\nAdministrator/P‘ data3.txt
                On Tuesday, the Linux System
#保持空间
    模式空间（pattern space）是一块活跃的缓冲区，sed编辑器有另一块称作保持空间（hold space）的缓冲区域。
    sed编辑器的保持空间命令
        命 令  描 述
        h      将模式空间复制到保持空间
        H      将模式空间附加到保持空间
        g      将保持空间复制到模式空间
        G      将保持空间附加到模式空间
        x      交换模式空间和保持空间的内容
        $ cat data2.txt
        This is the header line.
        This is the first data line.
        This is the second data line.
        This is the last line.
        $
        $ sed -n ‘/first/ h ; p ; n ; p ; g ; p ‘ data2.txt
        This is the first data line.
        This is the second data line.
        This is the first data line.
        $
        $ sed -n ‘/first/ h ; n ; p ; g ; p ‘ data2.txt    -- 合理使用，以相反方向输出这两行
        This is the second data line.
        This is the first data line.
        $
#排除命令
    感叹号命令（ ! ）用来排除（ negate ）命令
        $ sed -n ‘/header/!p‘ data2.txt    -- 除了包含单词header那一行外，文件中其他所有的行都被打印出来了
        This is the first data line.
        This is the second data line.
        This is the last line.
        $
        $ sed ‘$!N;    -- 最后一行，不执行N命令
        > s/System\nAdministrator/Desktop\nUser/
        > s/System Administrator/Desktop User/
        > ‘ data4.txt
        On Tuesday, the Linux Desktop
        User‘s group meeting will be held.
        All Desktop Users should attend.
        $
        $ cat data2.txt
        This is the header line.
        This is the first data line.
        This is the second data line.
        This is the last line.
        $
        $ sed -n ‘1!G ; h ; $p ‘ data2.txt    -- 倒序输出（未理解），可以使用tac达到相同效果cat data2.txt
        This is the last line.
        This is the second data line.
        This is the first data line.
        This is the header line.
        $
#改变流
    sed编辑器会从脚本的顶部开始，一直执行到脚本的结尾。sed编辑器提供了一个方法来改变命令脚本的执行流程
        分支（ branch ）命令 b的格式如下：
        [ address ]b [ label ]
        address 参数决定了哪些行的数据会触发分支命令。 label 参数定义了要跳转到的位置。如果没有加 label 参数，跳转命令会跳转到脚本的结尾。
            $ cat data2.txt
            This is the header line.
            This is the first data line.
            This is the second data line.
            This is the last line.
            $
            $ sed ‘2,3b ; s/This is/Is this/ ; s/line./test?/‘ data2.txt    -- 分支命令在数据流中的第2行和第3行处跳过了两个替换命令
            Is this the header test?
            This is the first data line.
            This is the second data line.
            Is this the last test?
            $    
            要是不想直接跳到脚本的结尾，可以为分支命令定义一个要跳转到的标签。标签以冒号开始，最多可以是7个字符长度。
            $ sed ‘/first/b jump1 ; s/This is the/No jump on/    -- 自定义标签。标签允许你跳过地址匹配处的命令，但仍然执行脚本中的其他命令
            > :jump1    
            > s/This is the/Jump here on/‘ data2.txt
            No jump on header line
            Jump here on first data line
            No jump on second data line
            No jump on last line
            $    
            $ echo "This, is, a, test, to, remove, commas." | sed -n ‘
            > :start    -- 也可以跳转到脚本中靠前面的标签上，这样就达到了循环的效果
            > s/,//1p
            > /,/b start    -- 设置了匹配模式，没有（，）循环就会终止
            > ‘
            This is, a, test, to, remove, commas.
            This is a, test, to, remove, commas.
            This is a test, to, remove, commas.
            This is a test to, remove, commas.
            This is a test to remove, commas.
            This is a test to remove commas.
            $    
        测试
            类似于分支命令，测试（ test ）命令（ t ）也可以用来改变sed编辑器脚本的执行流程
            格式：[ address ]t [ label ]    
                $ sed ‘
                > s/first/matched/
                > t    -- first改行不再执行
                > s/This is the/No match on/
                > ‘ data2.txt
                No match on header line
                This is the matched data line
                No match on second data line
                No match on last line
                $    
                $ echo "This, is, a, test, to, remove, commas. " | sed -n ‘
                > :start
                > s/,//1p
                > t start    -- 当无需替换时，测试命令不会跳转而是继续执行剩下的脚本
                > ‘
                This is, a, test, to, remove, commas.
                This is a, test, to, remove, commas.
                This is a test, to, remove, commas.
                This is a test to, remove, commas.
                This is a test to remove, commas.
                This is a test to remove commas.
                $
#模式替代
    & 符号
        不管模式匹配的是什么样的文本，你都可以在替代模式中使用 & 符号来使用这段文本
        $ echo "The cat sleeps in his hat." | sed ‘s/.at/"&"/g‘
        The "cat" sleeps in his "hat".
    替代单独的单词
        sed编辑器用圆括号来定义替换模式中的子模式。替代字符由反斜线和数字组成，第一个子模式分配字符 \1 ，给第二个子模式分配字符 \2 ，依此类推。
            $ echo "The System Administrator manual" | sed ‘
            > s/\(System\) Administrator/\1 User/‘    -- \1 来提取第一个匹配的子模式
            The System User manual            
            $ echo "1234567" | sed ‘    -- 大数字中插入逗号
            > :start
            > s/\(.*[0-9]\)\([0-9]\3\\)/\1,\2/
            > t start
            > ‘
            1,234,567
#脚本中使用sed
    使用包装脚本
        $ cat sw_cs.sh
        sed -n ‘ 1!G ; h ; $p ‘ $1
        $
        $ sh sw_cs.sh data2.txt
        This is the last line.
        This is the second data line.
        This is the first data line.
        This is the header line.
    重定向 sed 的输出
        $ cat fact.sh
        factorial=1
        counter=1
        number=$1
        #
        while [ $counter -le $number ]
        do
        factorial=$[ $factorial * $counter ]
        counter=$[ $counter + 1 ]
        done
        #
        result=$(echo $factorial | sed ‘
        :start
        s/\(.*[0-9]\)\([0-9]\3\\)/\1,\2/
        t start
        ‘)
        #
        echo "The result is $result"
        #
        $
        $ sh fact.sh 20
        The result is 2,432,902,008,176,640,000
        $
#创建 sed 实用工具
    1.加倍行间距
        $ sed ‘G‘ data2.txt
        This is the header line.
        
        This is the first data line.
        
        This is the second data line.
        
        This is the last line.
        
        $
        $ sed ‘$!G‘ data2.txt
        This is the header line.
        
        This is the first data line.
        
        This is the second data line.
        
        This is the last line.    -- 最后一行跳过G
        $
    2.对可能含有空白行的文件加倍行间距
        $ cat data6.txt
        This is line one.
        This is line two.

        This is line three.
        This is line four.
        $    
        $ sed ‘/^$/d ; $!G‘ data6.txt
        This is line one.

        This is line two.

        This is line three.

        This is line four.
        $    
    3.给文件中的行编号
        $ sed ‘=‘ data2.txt | sed ‘N; s/\n/ /‘
        1 This is the header line.
        2 This is the first data line.
        3 This is the second data line.
        4 This is the last line.
    4.打印末尾行(打印末尾10行)
        $ cat data7.txt
        This is line 1.
        This is line 2.
        This is line 3.
        This is line 4.
        This is line 5.
        This is line 6.
        This is line 7.
        This is line 8.
        This is line 9.
        This is line 10.
        This is line 11.
        This is line 12.
        This is line 13.
        This is line 14.
        This is line 15.
        $
        $ sed ‘
        > :start
        > $q ; N ; 11,$D
        > b start
        > ‘ data7.txt
        This is line 6.
        This is line 7.
        This is line 8.
        This is line 9.
        This is line 10.
        This is line 11.
        This is line 12.
        This is line 13.
        This is line 14.
        This is line 15.
        $
    5.删除行
        删除连续的空白行：$ sed ‘/./,/^$/!d‘ data8.txt
        删除开头的空白行：$ sed ‘/./,$!d‘ data9.txt
        删除结尾的空白行:
            $ sed ‘
            :start
            / ^\n*$/$d; N; b start 
            ‘
    6.删除 html 标签
        $ sed ‘s/<[^>]*>//g ; /^$/d‘ data11.txt
以上是关于sed进阶的主要内容，如果未能解决你的问题，请参考以下文章