Linux高级文本处理之sed
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux高级文本处理之sed相关的知识,希望对你有一定的参考价值。
sed:Stream Editor文本流编辑,sed是一个“非交互式的”面向字符流的编辑器。能同时处理多个文件多行的内容,可以不对原文件改动,把整个文件输入到屏幕,可以把只匹配到模式的内容输入到屏幕上。还可以对原文件改动,但是不会再屏幕上返回结果。
基本概念
一.sed命令的语法如下所示:
sed [options] script filename
sed命令的选项(option):
-n :只打印模式匹配的行 -e :多脚本运行,多点编辑,例如 -e script1 -e script2 -e script3 -f :将sed的动作写在一个文件内,用–f filename 执行filename内的sed动作 -r :支持扩展表达式 -i :直接修改文件内容
同大多数Linux命令一样,sed也是从stdin中读取输入,并且将输出写到stdout,但是当filename被指定时,则会从指定的文件中获取输入,输出可以重定向到文件中,但是需要注意的是,该文件绝对不能与输入的文件相同。
options是指sed的命令行参数,这一块并不是重点,参数也不多。
script是指需要对输入执行的一个或者多个操作指令(instruction),sed会依次读取输入文件的每一行到缓存中并应用script中指定的操作指令,因此而带来的变化并不会影响最初的文件(注:如果使用sed时指定-i参数则会影响最初的文件)。如果操作指令很多,为了不影响可读性,可以将其写到文件中,并通过-f参数指定scriptfile:
sed -f scriptfile filename
说明:
这里有一个建议,在命令行中指定的操作指令最好用单引号引起来,这样可以避免shell对特殊字符的处理。
二、sed工作原理
1.读入新的一行内容到缓存空间;
2.从指定的操作指令中取出第一条指令,判断是否匹配pattern;
3.如果不匹配,则忽略后续的编辑命令,回到第2步继续取出下一条指令;
4.如果匹配,则针对缓存的行执行后续的编辑命令;完成后,回到第2步继续取出下一条指令;
5.当所有指令都应用之后,输出缓存行的内容;回到第1步继续读入下一行内容;
6.当所有行都处理完之后,结束;
sed工作原理图:
三、简单例子
实例1:将MA替换为Massachusetts
[[email protected] ~]# cat list John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed -e ‘[email protected]@[email protected]‘ list John Daggett, 341 King Road, Plymouth Massachusetts Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury Massachusetts Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston Massachusetts
实例2:这里面的-e选项是可选的,这个参数只是在命令行中同时指定多个操作指令时才需要用到
[[email protected] ~]# sed -e ‘s/ MA/, Massachusetts/‘ -e ‘s/ PA/, Pennsylvania/‘ list John Daggett, 341 King Road, Plymouth, Massachusetts Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania Eric Adams, 20 Post Road, Sudbury, Massachusetts Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston, Massachusetts
即使在多个操作指令的情况下,-e参数也不是必需的,我一般不会加-e参数,比如上面的例子可以换成下面的写法:
[[email protected] ~]# sed ‘s/ MA/, Massachusetts/;s/ PA/, Pennsylvania/‘ list
说明:操作指令之间可以用逗号分隔,这点和shell命令可以用逗号分隔是一样的。
实例3:只输出修改过的内容
[[email protected] ~]# sed -n ‘[email protected]@[email protected]‘ list John Daggett, 341 King Road, Plymouth Massachusetts Eric Adams, 20 Post Road, Sudbury Massachusetts Sal Carpenter, 73 6th Street, Boston Massachusetts
说明:sed命令是指定-n参数,该参数会抑制sed默认的输出
模式空间与地址匹配
一、模式空间的转换
sed只会缓存一行的内容在模式空间,这样的好处是sed可以处理大文件而不会有任何问题,不像一些编辑器因为要一次性载入文件的一大块内容到缓存中而导致内存不足。下面用一个简单的例子来讲解模式空间的转换过程,如下图所示:
现在要把一段文本中的Unix System与UNIX System都要统一替换成The UNIX Operating System,因此我们用两句替换命令来完成这个目的:
s/Unix /UNIX / s/UNIX System/UNIX Operating System/
对应上图,过程如下:
1.首先一行内容The Unix System被读入模式空间;
2.应用第一条替换命令将Unix替换成UNIX;
3.现在模式空间的内容变成The UNIX System;
4.应用第二条替换命令将UNIX System替换成UNIX Operating System;
5.现在模式空间的内容变成The UNIX Operating System;
6.所有编辑命令执行完毕,默认输出模式空间中的行;
二、地址匹配
默认情况下,sed是全局匹配的,即对所有输入行都应用指定的编辑命令,这是因为sed依次读入每一行,每一行都会成为当前行并被处理,所以s/CA/California/g会将所有输入行的CA替换成California。这一点跟vi/vim是不一样的,众所周知,vim的替换命令默认是替换当前行的内容,除非你指定%s才会作全局替换。
实例1:将list文件中包含Sal的行中MA替换为Massachusetts
[[email protected] ~]# sed -e /Sal/‘[email protected]@[email protected]‘ list John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston Massachusetts
说明:/Sal/是一个正则表达式匹配包含Sebastopol的行
/Sal/是一个正则表达式匹配包含Sebastopol的行,因此像行“San Francisco, CA”则不会被替换。
sed命令中可以包含0个、1个或者2个地址(地址对),地址可以为正则表达式(如/Sal/),行号或者特殊的行符号(如$表示最后一行):
● 如果没有指定地址,默认将编辑命令应用到所有行;
●如果指定一个地址,只将编辑命令应用到匹配该地址的行;
●如果指定一个地址对(addr1,addr2),则将编辑命令应用到地址对中的所有行(包括起始和结束);
●如果地址后面有一个感叹号(!),则将编辑命令应用到不匹配该地址的所有行;
实例2:为了方便理解上述内容,我们以删除命令(d)为例,默认不指定地址将会删除所有行
[[email protected] ~]# sed ‘d‘ list [[email protected] ~]#
实例3:删除制定的行
[[email protected] ~]# cat -n list 1 John Daggett, 341 King Road, Plymouth MA 2 Alice Ford, 22 East Broadway, Richmond VA 3 Orville Thomas, 11345 Oak Bridge Road, Tulsa OK 4 Terry Kalkas, 402 Lans Road, Beaver Falls PA 5 Eric Adams, 20 Post Road, Sudbury MA 6 Hubert Sims, 328A Brook Road, Roanoke VA 7 Amy Wilde, 334 Bayshore Pkwy, Mountain View CA 8 Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed ‘1d‘ list #删除list文件的第一行 Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed ‘$d‘ list #删除list文件的最后一行 John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA [[email protected] ~]# sed /MA/‘d‘ list #删除包含MA的行 Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA [[email protected] ~]# sed ‘/MA/d‘ list #同上,也是删除包含MA的行 Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
实例4:通过指定地址对可以删除该范围内的所有行,例如删除第3行到最后一行
[[email protected] ~]# cat list John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed ‘2,$d‘ list John Daggett, 341 King Road, Plymouth MA
实例5:使用正则匹配,删除从包含Alice的行开始到包含Hubert的行结束的所有行
[[email protected] ~]# cat list John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed ‘/Alice/,/Hubert/d‘ list John Daggett, 341 King Road, Plymouth MA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA
实例6:行号和地址对是可以混用的
[[email protected] ~]# cat -n list 1 John Daggett, 341 King Road, Plymouth MA 2 Alice Ford, 22 East Broadway, Richmond VA 3 Orville Thomas, 11345 Oak Bridge Road, Tulsa OK 4 Terry Kalkas, 402 Lans Road, Beaver Falls PA 5 Eric Adams, 20 Post Road, Sudbury MA 6 Hubert Sims, 328A Brook Road, Roanoke VA 7 Amy Wilde, 334 Bayshore Pkwy, Mountain View CA 8 Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed ‘2,/Amy/d‘ list #删除第二行到Amy之间的所有行 John Daggett, 341 King Road, Plymouth MA Sal Carpenter, 73 6th Street, Boston MA
实例7:如果在地址后面指定感叹号(!),则会将命令应用到不匹配该地址的行
[[email protected] ~]# sed ‘1,3!d‘ list #表示删除1到3行以外的行 John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK
实例8:执行多个编辑命令,sed中可以用{}来组合命令,就好比编程语言中的语句块
[[email protected] ~]# cat -n list 1 John Daggett, 341 King Road, Plymouth MA 2 Alice Ford, 22 East Broadway, Richmond VA 3 Orville Thomas, 11345 Oak Bridge Road, Tulsa OK 4 Terry Kalkas, 402 Lans Road, Beaver Falls PA 5 Eric Adams, 20 Post Road, Sudbury MA 6 Hubert Sims, 328A Brook Road, Roanoke VA 7 Amy Wilde, 334 Bayshore Pkwy, Mountain View CA 8 Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed -n ‘1,4{s/ MA/, Massachusetts/;s/ PA/, Pennsylvania/;p}‘ list John Daggett, 341 King Road, Plymouth, Massachusetts Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania
实例9:显示list文件中的奇数行
[[email protected] ~]# sed -n ‘1~2p‘ list #1~2表示从第一行开始步进单位为2行 John Daggett, 341 King Road, Plymouth MA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Eric Adams, 20 Post Road, Sudbury MA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA
实例10:显示list文件中的偶数行
[[email protected] ~]# cat -n list 1 John Daggett, 341 King Road, Plymouth MA 2 Alice Ford, 22 East Broadway, Richmond VA 3 Orville Thomas, 11345 Oak Bridge Road, Tulsa OK 4 Terry Kalkas, 402 Lans Road, Beaver Falls PA 5 Eric Adams, 20 Post Road, Sudbury MA 6 Hubert Sims, 328A Brook Road, Roanoke VA 7 Amy Wilde, 334 Bayshore Pkwy, Mountain View CA 8 Sal Carpenter, 73 6th Street, Boston MA [[email protected] ~]# sed -n ‘2~2p‘ list Alice Ford, 22 East Broadway, Richmond VA Terry Kalkas, 402 Lans Road, Beaver Falls PA Hubert Sims, 328A Brook Road, Roanoke VA Sal Carpenter, 73 6th Street, Boston MA
实例11:显示list文件中从第6行开往后的三行
[[email protected] ~]# sed -n ‘6,+3p‘ list Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA Sal Carpenter, 73 6th Street, Boston MA
说明:#,+n表明从数字#开始后边的n行
本文出自 “追求不完美” 博客,请务必保留此出处http://yolynn.blog.51cto.com/11575833/1889557
以上是关于Linux高级文本处理之sed的主要内容,如果未能解决你的问题,请参考以下文章