Linux文本处理三剑客之---grep

Posted 2020-11-09

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Linux文本处理三剑客之---grep相关的知识，希望对你有一定的参考价值。

grep

grep: Global search REgular expression and Print out the line

作用：文本搜索工具，根据用户指定的“模式”对目标文本逐行进行匹配检查；打印匹配到的行

模式：由正则表达式字符及文本字符所编写的过滤条件

1、命令格式

grep [OPTIONS] PATTERN [FILE...]

grep root /etc/passwd

grep "$USER" /etc/passwd

grep '$USER' /etc/passwd

grep `whoami` /etc/passwd

[[email protected] ~]#grep root /etc/passwd #直接使用字符串过滤

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep "$USER" /etc/passwd #通过变量过滤

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep $USER /etc/passwd #使用变量过滤（不加双引号）

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep `whoami` /etc/passwd #使用命令过滤（``）

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

2、grep命令选项：

参数	描述
--color=auto	对匹配到的文本着色显示
-v	显示不被pattern匹配到的行
-i	忽略字符大小写
-n	显示匹配的行号
-c	统计匹配的行数
-o	仅显示匹配到的字符串
-q	静默模式，不输出任何信息
-A #	after, 后#行
-B #	before, 前#行
-C #	context, 前后各#行
-e	实现多个选项间的逻辑or关系
-w	匹配整个单词
-E	使用ERE grep –e ‘cat ’ -e ‘dog’ file
-F	相当于fgrep，不支持正则表达式
-f：FILE, --file=FILE	文件包含某字符串 Obtain patterns from FILE, one per line.

例子：

[[email protected] ~]#grep -v "root" /etc/passwd #-v过滤掉root行

bin:x:1:1:bin:/bin:/sbin/nologin

daemon:x:2:2:daemon:/sbin:/sbin/nologin

[[email protected] ~]#grep -i "root" /etc/passwd #-i不区分大小写

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep -ni "root" /etc/passwd #既显示行号又不区分大小写

1:root:x:0:0:root:/root:/bin/bash

10:operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#cat -n /etc/passwd|grep root #等同于grep -ni “root” /etc/passwd

1 root:x:0:0:root:/root:/bin/bash

10 operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#ls|grep ks

anaconda-ks.cfg

initial-setup-ks.cfg

[[email protected] ~]#ls *ks*

anaconda-ks.cfg initial-setup-ks.cfg

[[email protected] ~]#grep -ci "root" /etc/passwd #统计匹配到的行数

[[email protected] ~]#grep -io "ROOT" /etc/passwd #不区分大小写，仅显示匹配到的字符串

root

[[email protected] ~]#grep -cio "ROOT" /etc/passwd #统计仅显示匹配的字符不区分大小写

[[email protected] ~]#grep -nA3 "root" /etc/passwd #显示匹配到行之后n行

1:root:x:0:0:root:/root:/bin/bash

2-bin:x:1:1:bin:/bin:/sbin/nologin

3-daemon:x:2:2:daemon:/sbin:/sbin/nologin

4-adm:x:3:4:adm:/var/adm:/sbin/nologin

10:operator:x:11:0:operator:/root:/sbin/nologin

11-games:x:12:100:games:/usr/games:/sbin/nologin

12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

13-nobody:x:99:99:Nobody:/:/sbin/nologin

[[email protected] ~]#grep -nB3 "root" /etc/passwd #显示匹配到行之前n行

1:root:x:0:0:root:/root:/bin/bash

7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

8-halt:x:7:0:halt:/sbin:/sbin/halt

9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

10:operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep -nC3 "root" /etc/passwd #显示匹配到行之前后n行

1:root:x:0:0:root:/root:/bin/bash

2-bin:x:1:1:bin:/bin:/sbin/nologin

3-daemon:x:2:2:daemon:/sbin:/sbin/nologin

4-adm:x:3:4:adm:/var/adm:/sbin/nologin

7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

8-halt:x:7:0:halt:/sbin:/sbin/halt

9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin

10:operator:x:11:0:operator:/root:/sbin/nologin

11-games:x:12:100:games:/usr/games:/sbin/nologin

12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

13-nobody:x:99:99:Nobody:/:/sbin/nologin

[[email protected] ~]#grep root /etc/passwd|grep bash #|并且关系

root:x:0:0:root:/root:/bin/bash

[[email protected] ~]#grep -e root -e bash /etc/passwd #-e或者关系

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

rick:x:1000:1000:rick:/home/rick:/bin/bash

mage:x:1001:1001::/home/mage:/bin/bash

wang:x:1002:1004::/home/wang:/bin/bash

tom:x:1003:1005::/home/tom:/bin/bash

[[email protected] ~]#echo xabcy|grep abc #匹配包含abc字符串

xabcy

[[email protected] ~]#echo xabcy|grep -w abc #匹配整个单词

[[email protected] ~]#echo "x abc y"|grep -w abc #单词前后有空格可以匹配

x abc y

[[email protected] ~]#echo "x,abc,y"|grep -w abc #单词前后有逗号可以匹配

x,abc,y

[[email protected] ~]#echo "x2abc3y"|grep -w abc #单词前后有数字不能匹配到

[[email protected] ~]#echo "x_abc_y"|grep -w abc #单词前后有下划线不能匹配到

[[email protected] ~]#echo "x-abc-y"|grep -w abc #单词前后有横杠可以匹配

x-abc-y

总结：

grep过滤出单词时，单一字母，数字加字母，下划线加字母属于单词，其他符号都属于不同单词

[[email protected] ~]#cat >p.txt #自定义字符串

root

bash

[[email protected] ~]#cat p.txt

root

bash

[[email protected] ~]#grep -f p.txt /etc/passwd #-f包含某字符关系

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

rick:x:1000:1000:rick:/home/rick:/bin/bash

mage:x:1001:1001::/home/mage:/bin/bash

wang:x:1002:1004::/home/wang:/bin/bash

tom:x:1003:1005::/home/tom:/bin/bash

3、正则表达式

REGEXP：由一类特殊字符及文本字符所编写的模式，其中有些字符（元字符）不表示字符字面意义，而表示控制或通配的功能

3.1、程序支持：grep,sed,awk,vim, less,nginx,varnish等

3.2、分两类：

基本正则表达式：BRE

扩展正则表达式：ERE

grep -E, egrep

正则表达式引擎：

采用不同算法，检查处理正则表达式的软件模块

PCRE（Perl Compatible Regular Expressions）

3.3、元字符分类：字符匹配、匹配次数、位置锚定、分组

3.4、man 7 regex

3.2.1、基本正则表达式元字符

通匹配符： *，？，[:digit:]：这些通配符表示文件名

正则表达式元字符：表示文件内容的字符串

3.2.2、egrep及扩展的正则表达式

egrep = grep -E

egrep [OPTIONS] PATTERN [FILE...]

1、扩展正则表达式的元字符：

1）字符匹配：

. 任意单个字符

[] 指定范围的字符

[^] 不在指定范围的字符

2、扩展正则表达式

1）次数匹配：

*：匹配前面字符任意次

?: 0或1次

+：1次或多次

{m}：匹配m次

{m,n}：至少m，至多n次

2）位置锚定：

^ :行首

$ :行尾

\<, \b :语首

\>, \b :语尾

3）分组：

()

后向引用：\1, \2, ...

4）或者：

a|b: a或b

C|cat: C或cat

(C|c)at:Cat或cat

3.3.1、字符匹配

选项：

符号	描述
.	匹配任意单个字符（“.”点号在中括号[]表示本身意义）
[]	匹配指定范围内的任意单个字符
[^]	匹配指定范围外的任意单个字符
[:alnum:]	字母和数字
[:alpha:]	代表任何英文大小写字符，亦即 A-Z, a-z
[:lower:]	小写字母 [:upper:] 大写字母
[:blank:]	空白字符（空格和制表符）
[:space:]	水平和垂直的空白字符（比[:blank:]包含的范围广）
[:cntrl:]	不可打印的控制字符（退格、删除、警铃...）
[:digit:]	十进制数字 [:xdigit:]十六进制数字
[:graph:]	可打印的非空白字符
[:print:]	可打印字符
[:punct:]	标点符号

例子：

[[email protected] ~]#echo abbc|grep a.c

[[email protected] ~]#echo abbc|grep a..c

abbc

[[email protected] ~]#echo ab我c|grep a..c

ab我c

提示：汉字也表示一个字符不是一个字节

[[email protected] ~]#echo axcdef|grep "a[xyz]c" #匹配中括号中任意字符串

axcdef

[[email protected] ~]#echo azcdef|grep "a[xyz]c" #匹配中括号中任意字符串

azcdef

[[email protected] ~]#echo abcdef|grep "a[^xyz]c" #匹配不以中括号里字符串

abcdef

3.3.2、匹配次数

用在要指定次数的字符后面，用于指定前面的字符要出现的次数

符号	描述
*	匹配前面的字符任意次，包括0次（贪婪模式：尽可能长的匹配）
.*	任意长度的任意字符
\?	匹配其前面的字符0或1次
\+	匹配其前面的字符至少1次
\{n\}	匹配前面的字符n次
\{m,n\}	匹配前面的字符至少m次，至多n次
\{,n\}	匹配前面的字符至多n次
\{n,\}	匹配前面的字符至少n次

例子：

1）匹配默认字符和整体单词匹配

[[email protected] ~]#touch abc

[[email protected] ~]#touch xyz

[[email protected] ~]#ls|grep ... #匹配三个字符串

abc

anaconda-ks.cfg

Desktop

Documents

Downloads

initial-setup-ks.cfg

[[email protected] ~]#ls|grep -w "..." #匹配整个单词字符串

abc

abc.log

all.log

anaconda-ks.cfg

b.txt

df.log

initial-setup-ks.cfg

2）匹配前面字符任意次: *（包含0次）

[[email protected] ~]#echo axb|grep ax*b #匹配重复x任意次，包含0次

axb

[[email protected] ~]#echo ab|grep "ax*b" #匹配重复x任意次，包含0次

[[email protected] ~]#echo abb|grep "ax*b"

abb

[[email protected] ~]#echo aab|grep "ax*b"

aab

[[email protected] ~]#echo axxxxb|grep "ax*b"

axxxxb

[[email protected] ~]#ls |grep ".*\.txt" #过滤出以任意字符的.txt文件

b.txt

linux.txt

mail.txt

[[email protected] ~]#ls |grep -o "\.txt" #仅显示匹配到的字符串

.txt

[[email protected] ~]#cat google.txt

google

gooooooooooooooooooooooooooogle

gogle

ggle

goooooooooooooo00000000ooooogle

[[email protected] ~]#grep google google.txt #仅显示匹配到的字符串

google

[[email protected] ~]#grep "go*gle" google.txt #匹配*前面字符串的任意次

google

gooooooooooooooooooooooooooogle

gogle

ggle

[[email protected] ~]#grep "go\?gle" google.txt #匹配？前面字符串的0或1次

gogle

ggle

[[email protected] ~]#grep "go.*gle" google.txt #匹配任意长度的字符串（不包含0次）

google

gooooooooooooooooooooooooooogle

gogle

goooooooooooooo00000000ooooogle

[[email protected] ~]#grep "go\+gle" google.txt #匹配\+前面字符至少1次

google

gooooooooooooooooooooooooooogle

goggle

[[email protected] ~]#grep "go\{27\}gle" google.txt #匹配前面字符n次

gooooooooooooooooooooooooooogle

[[email protected] ~]#grep "go\{20,\}gle" google.txt #匹配前面字符至少n次

gooooooooooooooooooooooooooogle

[[email protected] ~]#grep "go\{20,30\}gle" google.txt #匹配前面字符至少m次，至多n次

gooooooooooooooooooooooooooogle

[[email protected] ~]#grep "go\{0,\}gle" google.txt #匹配前面字符至少0次

google

gooooooooooooooooooooooooooogle

gogle

ggle

[[email protected] ~]#grep "go*gle" google.txt #匹配前面字符任意次

google

gooooooooooooooooooooooooooogle

gogle

ggle

[[email protected] ~]#grep "go\{,30\}gle" google.txt #匹配前面字符至多n次

google

gooooooooooooooooooooooooooogle

gogle

ggle

3.3.3、位置锚定：定位出现的位置

选项：

符号	描述
^	行首锚定，用于模式的最左侧
$	行尾锚定，用于模式的最右侧
^PATTERN$	用于模式匹配整行
^$	空行
^[[:space:]]*$	空白行
\< 或 \b	词首锚定，用于单词模式的左侧
\> 或 \b	词尾锚定；用于单词模式的右侧
\<PATTERN\>	匹配整个单词

例子：

1）根据行首行尾某字符进行匹配

[[email protected] ~]#grep root /etc/passwd

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep "^root" /etc/passwd #以root开头进行匹配

root:x:0:0:root:/root:/bin/bash

[[email protected] ~]#grep "bash$" /etc/passwd #以bash结尾进行匹配

root:x:0:0:root:/root:/bin/bash

rick:x:1000:1000:rick:/home/rick:/bin/bash

mage:x:1001:1001::/home/mage:/bin/bash

wang:x:1002:1004::/home/wang:/bin/bash

tom:x:1003:1005::/home/tom:/bin/bash

2）过虑空行，只显示非空行

[[email protected] ~]#cat -A f1

^I$

aa $

bb$

[[email protected] ~]#grep -v "^$" f1|grep -v "^[[:space:]]" #先过滤以空行开头再过滤以空格与TAB行开头

[[email protected] ~]#grep -v "^[[:space:]]*$" f1 #直接使用以[:space:]开头重复多次结尾过滤

3）确定某字符串出现位置

[[email protected] ~]#grep "\<root" /etc/passwd #以root单词左侧开始匹配

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

rooter:x:1004:1006::/home/rooter:/bin/bash

[[email protected] ~]#useradd admroot #添加用户admroot

[[email protected] ~]#grep "root\>" /etc/passwd #以root单词右侧开始匹配

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

admroot:x:1005:1007::/home/admroot:/bin/bash

4）匹配整个单词

[[email protected] ~]#grep "\<root\>" /etc/passwd #匹配root整个单词

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep "\broot\b" /etc/passwd #匹配root整个单词

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

[[email protected] ~]#grep -w "root" /etc/passwd #-w匹配某个单词

root:x:0:0:root:/root:/bin/bash

operator:x:11:0:operator:/root:/sbin/nologin

3.3.4、分组模式

1、分组： 将一个或多个字符捆绑在一起，当作一个整体进行处理，如：$root$\+

分组括号中的模式匹配到的内容会被正则表达式引擎记录于内部的变量中，这些变量的命名方式为

: \1, \2, \3, ...

\1 表示从左侧起第一个左括号以及与之匹配右括号之间的模式所匹配到的字符

示例： $string1\+\(string2$*\)

\1 ：string1\+$string2$*

\2 ：string2

2、后向引用：引用前面的分组括号中的模式所匹配字符，而非模式本身

3、或者：\|

示例：a\|b: a或b C\|cat: C或cat $C\|c$at:Cat或cat

例子：

1）将一个或多个字符捆绑成一个整体：

[[email protected] ~]#echo wangwangwang|grep "wang\{3\}" #没有匹配到3次g字符串

[[email protected] ~]#echo wanggg|grep "wang\{3\}" #对g字符串匹配3次，而不是wang匹配3次

wanggg

[[email protected] ~]#echo wangwangwang|grep "$wang$\{3\}" #将wang加括号当成整体匹配3次

wangwangwang

2）后向引用

[[email protected] ~]#echo wangwangwangxxxxwangwangwang|grep "$wang$\{3\}.*$wang$\{3\}" #前后分组匹配wang单词3次

wangwangwangxxxxwangwangwang

[[email protected]~]#echo wangwangwangxxxxwangwangwang|grep "$wang$\{3\}.*\1" #经过分组匹配字符，再通过后向引用取出

wangwangwangxxxxwangwangwang

[[email protected] ~]#echo wangwangwangxxxxmagemagewangwangwangmagemagemage|grep "$wang$\{3\}.*$mage$\+\1.*\2" #嵌套分组匹配

wangwangwangxxxxmagemagewangwangwangmagemagemage

#\1表示$wang$匹配正则表达式

#\2表示对应$mage$匹配正则表达式

[[email protected] ~]#echo rootxxrbbt|grep "$r..t$.*\1" #\1与分组括号匹配字符不相同

[[email protected] ~]#echo rootxxroot|grep "$r..t$.*\1" #\1与分组括号匹配字符相同

rootxxroot

[[email protected] ~]#echo rootxxrbbt|grep "$r..t$.*$r..t$" #这种方式第1个分组与第2个分组匹配字符按照括号里“..”来匹配最终字符串

rootxxrbbt

练习：

取出/etc/passwd行首单词与行尾单词相同

[[email protected] ~]#useradd -s /bin/ahaha haha

[[email protected] ~]#grep "^$.*$:.*\1$" /etc/passwd #匹配首尾单词不严谨

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

bash:x:1006:1008::/home/bash:/bin/bash

haha:x:1007:1009::/home/haha:/bin/ahaha #注意：行首与行尾匹配不相同，也被取出

[[email protected] ~]#grep "^$.*$:.*\/\1$" /etc/passwd #标准匹配首尾单词一致

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

bash:x:1006:1008::/home/bash:/bin/bash

3）或关系匹配

示例：

a\|b：a或b

C\|cat：C或cat

$C|c$at：Cat或cat

例子：

[[email protected] ~]#grep "^a\|^b.*" /etc/passwd #匹配a开头字符串的行，而b匹配以b开头后面接任意字符串的行

bin:x:1:1:bin:/bin:/sbin/nologin

adm:x:3:4:adm:/var/adm:/sbin/nologin

abrt:x:173:173::/etc/abrt:/sbin/nologin

avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin

admroot:x:1005:1007::/home/admroot:/bin/bash

bash:x:1006:1008::/home/bash:/bin/bash

[[email protected] ~]#grep "^$a\|b$.*" /etc/passwd #匹配以a或b开头后面接任意字符串的行

bin:x:1:1:bin:/bin:/sbin/nologin

adm:x:3:4:adm:/var/adm:/sbin/nologin

abrt:x:173:173::/etc/abrt:/sbin/nologin

avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin

admroot:x:1005:1007::/home/admroot:/bin/bash

bash:x:1006:1008::/home/bash:/bin/bash

[[email protected] ~]#echo abc|grep "^a"

abc

[[email protected] ~]#echo aaaa|grep "a\|b"

aaaa

[[email protected] ~]#echo bbb|grep "a\|b" #匹配a或b字符串

bbb

[[email protected] ~]#echo axy|grep "a\|bxy" #匹配a或bxy字符串

axy

[[email protected] ~]#echo axy|grep "$a\|b$xy" #匹配axy或bxy字符串

axy

练习：仅限使用grep实现

1、取网卡IP

2、取分区利用率最大值

3、判断OS主版本号（通用命令）

以上是关于Linux文本处理三剑客之---grep的主要内容，如果未能解决你的问题，请参考以下文章