Linux文本处理三剑客之grep

Posted 2020-06-22

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Linux文本处理三剑客之grep相关的知识，希望对你有一定的参考价值。

一、Linux文本处理三剑客

linux上主要处理文件的工具主要有grep家族（grep,egrep,fgrep）、awk、sed

grep：文本搜索工具；根据给定的”pattern“对文本进行搜索

sed：主要是以行为单位进行处理，可以将数据行进行替换、删除、新增、选取等特定工作

awk：是一个强大的文本分析工具

二、文本查找工具grep

grep:(Globally search a Regular Expression and Print);

作用：文本搜索工具，根据用户所指定的pattern（过滤条件）对目标文本逐行进行匹配查；打印出符合条件的行；grep支持正则表达式，正则表达式一般趋向于最大长度匹配，也就是所谓的贪婪模式。

正则表达式：Regular Expression

正则表达式就是处理字符串的方法，它是以行为单位来进行字符串的处理行为，可以让用户轻易达到查找、删除、替换某特定字符串的处理程序

分类：

基本正则表达式：BRE Basic Regular Expression

扩展正则表达式：ERE Expand Regular Expression

grep家族：

grep: 支持使用基本正则表达式

egrep：支持使用扩展正则表达式

fgrep：不支持使用正则表达式

二grep命令的主要参数：

grep [OPTIONS] PATTERN [FILE...]

PATTERN是过滤的条件，可以理解为我们想要找到的内容。

常用选项：

--color=auto：对匹配到的文本着色后高亮显示

[[email protected] ~]# alias
alias cp=‘cp -i‘
alias egrep=‘egrep --color=auto‘
alias fgrep=‘fgrep --color=auto‘
alias grep=‘grep --color=auto‘

# alias是查看系统中的有哪些命令被替换了;如果不需要命令替换可以使用unalias NAME 删除替换。

-i：忽略字符大小写

[[email protected] ~]# grep "oot" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
setroubleshoot:x:994:991::/var/lib/setroubleshoot:/sbin/nologin

# 可以看出只要包含pattern,也就是oot的行都会被匹配到，这一特性就是正则表达式的贪婪模式。

-o：仅显示匹配到的文本自身

[[email protected] ~]# grep -o "oot" /etc/passwd
oot
oot
oot
oot
oot
oot

# -o选项列出所有匹配pattern中的内容，而不是行！

-v， --invert-match：反向匹配

[[email protected] ~]# cat /etc/default/useradd 
# useradd defaults file
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

[[email protected] ~]# grep -v "E" /etc/default/useradd 
# useradd defaults file
GROUP=100

-E：支持扩展的正则表达式

# grep -e=egrep

-q, --quiet， --silient：静默模式，不输出任何信息

[[email protected] ~]# grep -q "O" /etc/default/useradd 
[[email protected] ~]# echo $?
0
[[email protected] ~]# grep -q "OOO" /etc/default/useradd 
[[email protected] ~]# echo $?
1

# grep 静默模式不会有信息打印到屏幕上，但是我们可以通过执行结果得知grep是否匹配到。通过echo "$?" 显示出上个命令执行结果，0表示上条命令执行成功，1表示失败。

-P 支持使用prel正则表达式

-e PATTERN 进行多模式匹配

[[email protected] ~]# cat /etc/default/useradd 
# useradd defaults file
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

grep -e "HO" -e "RO" /etc/default/useradd 
GROUP=100
HOME=/home

匹配上下文：

-A NUM 连同后面#行显示 #为非负整数

[[email protected] ~]# cat /etc/default/useradd 
# useradd defaults file
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

[[email protected] ~]# grep -A 2 "IN" /etc/default/useradd 
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash

-B NUM 连同前面#行显示 #为非负整数

[[email protected] ~]# grep -B 2 "IN" /etc/default/useradd 
GROUP=100
HOME=/home
INACTIVE=-1

-C NUM 连同前后#行一起显示 #为非负整数

[[email protected] ~]# grep -C 2 "IN" /etc/default/useradd 
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash

三、基本正则表达式：

①字符匹配：

.：匹配任意单个字符

[[email protected] ~]# cat /etc/default/useradd 
# useradd defaults file
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

[[email protected] ~]# grep "M" /etc/default/useradd 
HOME=/home
CREATE_MAIL_SPOOL=yes

[[email protected] ~]# grep "M..L" /etc/default/useradd 
CREATE_MAIL_SPOOL=yes

[ ]：匹配范围内的任意单个字符

[[email protected] ~]# grep "[MS]" /etc/default/useradd 
HOME=/home
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

^ ：匹配范围外的任意单个字符

[[email protected] ~]# grep "^[MS]" /etc/default/useradd 
SHELL=/bin/bash
SKEL=/etc/skel

[:digit:]:所有数字

[[email protected] ~]# grep "[[:digit:]]" /etc/default/useradd 
GROUP=100
INACTIVE=-1

[:lower:]：所有小写字母

[[email protected] ~]# grep "[[:lower:]]" /etc/default/useradd 
# useradd defaults file
HOME=/home
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

[:upper:]：所有大写字母

[[email protected] ~]# grep "[[:upper:]]" /etc/default/useradd 
GROUP=100
HOME=/home
INACTIVE=-1
EXPIRE=
SHELL=/bin/bash
SKEL=/etc/skel
CREATE_MAIL_SPOOL=yes

[:alpha:]：所有字母

[[email protected] mnt]# cat test
innode innode table innode table 
superblock  ext2 exr3
83 82 5400 7200

[[email protected] mnt]# grep "[[:alpha:]]" /mnt/test 
innode innode table innode table 
superblock  ext2 ext3

[:alnum:]：所有字母和数字

[[email protected] mnt]# grep "[[:alnum:]]" /mnt/test 
innode innode table innode table 
superblock  ext2 exr3
83 82 5400 7200

[:space:]：空白

[:blank:]：空格和TAB

[:punct:]：所有标点符号

②匹配次数：非负整数

用限制其前面的字符要出现的次数；默认工作于贪婪模式（只要找到符合的内容整行显示出来）；

*：匹配前面的字符任意N次

[[email protected] mnt]# cat test3
c
abcd
acccd
bccca

[[email protected] mnt]# cat test3
c
abcd
acccd
bccca

#前面的字符可以0次，1次....N次

\+：匹配前面的字符至少1次

[[email protected] mnt]# grep "d\+" /mnt/test3
abcd
acccd

\?：匹配前面的0次或1次

[[email protected] mnt]# grep "d\?" /mnt/test3
c
abcd
acccd
bccca

\{m\}：其前面的字符出现m次

[[email protected] mnt]# grep "c\{3\}" /mnt/test3
acccd
bccca

\{m,n\}：其前面的字符出现m-n次 [m,n]

[[email protected]ocalhost mnt]# grep "c\{1,3\}" /mnt/test3
c
abcd
acccd
bccca

\{0,n\}：出现0-n次

[[email protected] mnt]# grep "c\{0\}d" /mnt/test3
abcd
acccd

\{m,\}：至少出现m次

[[email protected] mnt]# grep "c\{3,\}d" /mnt/test3
acccd

③位置锚定

用来限制所匹配到的文本所出现在目标文本的位置。

^：行首锚定；用于模式的最左侧，^PATTERN

[[email protected] mnt]# tail -6  /etc/passwd
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
tom:x:1000:1000:mageedu:/home/mageedu:/bin/bash

[[email protected] mnt]# tail -6  /etc/passwd | grep "^n"
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin

$：行尾锚定；用于模式的最右侧，PATTERN$

[[email protected] mnt]# tail -6  /etc/passwd | grep "n$"
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin

^PATTERN$：要让PATTERN完全匹配一整行

[[email protected] mnt]# cat  test4
root 
root 123
root1234

[[email protected] mnt]# grep "^root1234$" test4
root1234

^$：空行

[[email protected] mnt]# cat /etc/issue 
\S
Kernel \r on an \m 

Welcome

[[email protected] mnt]# grep -v "^$" /etc/issue 
\S
Kernel \r on an \m 
Welcome

# -v用来匹配pattern以外的内容用来显示出空行，配合wc可以计算空白行的数量

^[[:space:]]*$ 用于使用^$查找不到空白行的情况

[[email protected] mnt]# cat  test4
root
 
root 123
                                     
root1234

[[email protected] mnt]# grep "^$"  /mnt/test4 |wc -l
0

[[email protected] mnt]# grep "^[[:space:]]*$"  /mnt/test4 |wc -l
2

\<或\b：词首锚定，用于限制匹配单词左侧，\<PATTERN, \bPATTERN

[[email protected] mnt]# grep "\<c" /etc/passwd
colord:x:996:994:User for colord:/var/lib/colord:/sbin/nologin
chrony:x:993:990::/var/lib/chrony:/sbin/nologin

\>或\b：词尾锚定，用于限制匹配单词右侧，PATTERN\>, PATTERN\b

[[email protected] mnt]# grep "S\>" /etc/passwd
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

\<PATTERN\>：单词锚定，精确匹配整个单词

[[email protected] mnt]# grep "\<root\>" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

④分组与引用：

$PATTERN$：将此PATTERN匹配到的字符当作一个整体，分组括号中的模式匹配到的字符会被正则表达式引擎自动记录于内部的变量中

可以理解为一个括号对应一个变量，变量为$1，$2，$3.....

pat1$pat2$pat3$pat4\(pat5$pat6\)

\1： $pat2$pat3$pat4\(pat5$pat6\)匹配到的结果

\2：$pat4\(pat5$pat6\)匹配到的结果

....

[[email protected] mnt]# grep "^\([[:alnum:]]\{1,\}\)\>.*\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

四、文本查找工具egrep ：

支持使用扩展正则表达式的grep命令，相当于grep -E;使用的方法和grep基本上没什么不同，\除了词首锚定和词尾锚定都可以去除。

egrep [OPTIONS] PATTERN [FILE...]

①字符匹配：

.：匹配任意单个字符

[ ]：匹配范围内的任意单个字符

[^ ]：匹配范围外的任意单个字符

[:digit:]:数字

[:lower:]：所有小写字母

[:upper:]：所有大写字母

[:alpha:]：所有字母

[:alnum:]：所有字母和数字

[:space:]：空白

[:blank:]：空格和TAB

[:punct:]：所有标点符号

②匹配次数：

用限制其前面的字符要出现的次数；默认工作于贪婪模式（只要找到符合的内容整行显示出来）

*：匹配前面的字符任意次（0，+∞）

+：匹配前面的字符至少1次

?：匹配前面的0次或1次

{m}：其前面的字符出现m次，m为非负整数

{m,}：其前面的字符出现m次，m为非负整数[m,n]

③位置锚定

用来限制模式所匹配到的文本只能出现于目标文本的位置

^：行首锚定；用于模式的最左侧，^PATTERN

$：行尾锚定；用于模式的最右侧，PATTERN$

^PATTERN$：要让PATTERN完全匹配一整行

^$：空行

^[[:space:]]*$

\<或\b：词首锚定，用于限定单词的左侧，\<PATTERN, \bPATTERN

\>或\b：词尾锚定，用于限定单词的右侧，PATTERN\>, PATTERN\b

\<PATTERN\>：单词锚定，精确匹配整个单词

④分组与引用：

(PATTERN)：将此PATTERN匹配到的字符当作一个整体，分组括号中的模式匹配到的字符会被正则表达式引擎自动记录于内部的变量中

可以理解为一个括号对应一个变量，变量为$1，$2，$3.....

a|b：a或者b

C|cat：表示C或cat

(C|c)at：表示Cat或cat

这里就用之前grep的例子作为比较

1.分组中\可以去掉

[[email protected] mnt]# egrep "^([[:alnum:]]{1,})\>.*\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

2.词首词尾的锚定\不能去掉

[[email protected] mnt]# egrep "\<root\>" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

3.查找UID大于1000的数字

[[email protected] mnt]# egrep -v "\<([0-9]|[1-9][0-9]|[1-9][0-9][0-9])\>" /etc/passwd
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

以上是关于Linux文本处理三剑客之grep的主要内容，如果未能解决你的问题，请参考以下文章