三剑客grep/sed/awk

Posted 2020-09-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了三剑客grep/sed/awk相关的知识，希望对你有一定的参考价值。

6.1、正则表达式
正则表达式：是一类字符所书写出的模式(pattern)；元字符：不表示字符本身的意义，用于额外功能性的描述。
1）基本正则表达式： BRE
元字符：

.: 匹配任意单个字符

[char]: 匹配指定范围内的任意单个字符

[^char]:匹配指定范围外的任意单个字符

字符集合: [:digit:], [:lower:], [:upper:], [:punct:], [:space:], [:alpha:], [:alnum:]

匹配次数（贪婪模式）：

*: 匹配其前面的字符任意次

\+：匹配其前面的字符至少一次

.*: 任意长度的任意字符

\?: 匹配其前面的字符 1 次或 0 次

\{m,n\}:匹配其前面的字符至少 m 次，至多 n 次

位置锚定：指定字符出现位置

^: 锚定行首，此字符后面的任意内容必须出现在行首
$: 锚定行尾，此字符前面的任意内容必须出现在行尾
^$: 空白行
\<或\b: 锚定词首，其后面的任意字符必须作为单词首部出现
\>或\b: 锚定词尾，其前面的任意字符必须作为单词尾部出现

分组：

 如： $ab$*,将 ab 最为一个整体匹配
\n: 后向引用，引用前面的第 n 个左括号以及与之对应的右括号中的模式所匹配到的内容

2）扩展正则表达式： ERE

与基本正则不相同的部分：

+: 匹配其前面的字符至少 1 次

?: 匹配其前面的字符 1 次或 0 次

{m,n}: 匹配其前面的字符至少 m 次，至多 n 次
(): 分组
a|b: a 或者 b

6.2、文本内容搜索工具 grep/egrep/fgrep

作用：文本搜索工具，根据用户指定的“ 模式”对目标文本逐行进行匹配检查；打印匹配到的行；

模式：由正则表达式字符及文本字符所编写的过滤条件；

grep [OPTIONS] PATTERN [FILE...]

-i: 忽略大小写

--color=auto: 对匹配到的文本着色显示；

-v: 显示没有被模式匹配到的行

-o:只显示被模式匹配到的字符串

-E:使用扩展正则表达式

-q: 静默模式，不输出任何信息；

-A #：after,后#行

-B #: before,前#行

-C #：context,前后各#行

egrep 相当于 grep -E

fgrep:不支持正则表达式

关于正则表达式的练习：

1、显示/proc/meminfo文件中以大写或小写S开头的行；
[[email protected] ~]# grep ‘^[sS]‘ /proc/meminfo 
[[email protected] ~]# grep -i ‘^s‘ /proc/meminfo  
SwapCached:            0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Shmem:               272 kB
Slab:              87964 kB
SReclaimable:      67932 kB
SUnreclaim:        20032 kB
2、显示/etc/passwd文件中其默认shell为非/sbin/nologin的用户；
[[email protected] ~]# grep  -v ‘/sbin/nologin$‘ /etc/passwd 
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mylinux:x:500:500::/home/mylinux:/bin/bash
mylinux1:x:501:501::/home/mylinux1:/bin/bash
3、显示/etc/passwd文件中其默认shell为/bin/bash的用户；
[[email protected] ~]# grep --color=auto ‘/sbin/nologin$‘ /etc/passwd     
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
...
   进一步：仅显示上述结果中其ID号最大的用户；
[[email protected] ~]# grep --color=auto ‘/sbin/nologin$‘ /etc/passwd | sort -t: -k3 -n| tail -1| cut -d: -f1
nfsnobody
4、找出/etc/passwd文件中的一位数或两位数；
[[email protected] ~]# grep --color=auto ‘\<[0-9][0-9]\?\>‘ /etc/passwd
[[email protected] ~]# grep --color=auto ‘\<[0-9]\{1,2\}\>‘ /etc/passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
...
5、显示/boot/grub/grub.conf中以至少一个空白字符开头的行；
[[email protected] ~]# grep --color=auto ‘^[[:space:]]\+‘ /boot/grub/grub.conf 
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.32-642.11.1.el6.x86_64 ro root=/dev/vda1 console=ttyS0 console=tty0 printk.time=1 panic=5 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=zh_CN.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_NO_LVM crashkernel=auto   rd_NO_DM
        initrd /boot/initramfs-2.6.32-642.11.1.el6.x86_64.img
...
6、显示/etc/rc.d/rc.sysinit文件中，以#开头，后面跟至少一个空白字符，而后又有至少一个非空白字符的行；
[[email protected] ~]# grep --color=auto ‘^#[[:space:]]\{1,\}[^[:space:]]\{1,\}‘ /etc/rc.d/rc.sysinit 
# /etc/rc.d/rc.sysinit - run once at boot time
# Taken in part from Miquel van Smoorenburg‘s bcheckrc.
# Check SELinux status
# Print a text banner.
# Only read this once.
# Initialize hardware
# Set default affinity
# Load other user-defined modules
# Load modules (for backward compatibility with VARs)
# Configure kernel parameters
...
7、找出netstat -tan命令执行结果中以‘LISTEN‘结尾的行；
[[email protected] ~]# netstat -tan | grep --color=auto ‘LISTEN[[:space:]]*$‘ 
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:80                  0.0.0.0:*                   LISTEN      
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN   
8、添加用户bash, testbash, basher, nologin（SHELL为/sbin/nologin），而找出当前系统上其用户名和默认shell相同的用户；
[[email protected] ~]# tail /etc/passwd
...
mylinux1:x:501:501::/home/mylinux1:/bin/bash
bash:x:502:502::/home/bash:/bin/bash
testbash:x:503:503::/home/testbash:/bin/bash
basher:x:504:504::/home/basher:/bin/bash
nologin:x:505:505::/home/nologin:/sbin/nologin
[[email protected] ~]# grep --color=auto ‘^\([[:alnum:]]\+\):.*\1$‘ /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
bash:x:502:502::/home/bash:/bin/bash
nologin:x:505:505::/home/nologin:/sbin/nologin
9、新建一个文本文件，假设有如下内容：
He like his lover.
He love his lover.
He like his liker.
He love his liker.
找出其中最后一个单词是由此前某单词加r构成的行。
[[email protected] home]# cat a.txt 
He like his lover.
He love his lover.
He like his liker.
He love his liker.
[[email protected] home]# grep --color=auto ‘\(l..e\).*\1r.‘ a.txt 
He love his lover.
He like his liker.
10、显示当前系统上root、mylinux或bash用户的默认shell；
[[email protected] home]# grep -E ‘^(root|mylinux|bash):‘ /etc/passwd | cut -d: -f7
/bin/bash
/bin/bash
/bin/bash
11、找出/etc/rc.d/init.d/functions文件中某单词后跟一组小括号“()”行；
[[email protected] home]# grep -E -o ‘\<[[:alnum:]]+\>\(\)‘ /etc/rc.d/init.d/functions 
checkpid()
daemon()
killproc()
pidfileofproc()
pidofproc()
status()
success()
failure()
passed()
warning()
action()
strstr()
confirm()
12、使用echo命令输出一个路径，而后使用grep取出其基名；
[[email protected] chapter01]# echo $PWD | grep -o -E ‘^/[[:alnum:]]+/?‘
/home/
[[email protected] chapter01]# pwd
/home/wswp-code/chapter01
13、找出ifconfig命令结果中的1-255之间的数字；
[[email protected] chapter01]# ifconfig | grep -o -E --color=auto ‘\<([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\>‘
52
54
78
42
10
104
160
...

6.3、流编辑器 sed

模式空间，默认不编辑原文件，仅对模式空间中的数据做处理；处理结束后，将模式空间中数据打印至屏幕。sed是一个行编辑器。

sed [options] ‘Address Command‘ file ...

Option：

-n: 静默模式，不再默认显示模式空间中的内容
-i: 直接修改原文件
-e SCRIPT 可以同时执行多个脚本，操作
-f /PATH/TO/SED_SCRIPT 如： sed -f /path/to/scripts file （指定 sed 的脚本文件）
-r: 表示使用扩展正则表达式

Address 地址定界：

1、 StartLine,EndLine 比如 1,10 $：最后一行
2、 /RegExp/ 如/^root/， root 开头的所有行，模式匹配
3、 /pattern1/,/pattern2/ 第一次被 pattern1 匹配到的行开始，至第一次被pattern2匹配到的行结束，这中间的所有行
4、 LineNumber指定的行
5、 StartLine, +N 从 startLine 开始，向后的N行。

Command：

1、d: 删除符合条件的行；

2、p: 显示符合条件的行；

3、a \”string”: 在指定的行后面追加新行，内容为 string；

4、\n：可以用于换行

5、i \”string”: 在指定的行前面添加新行，内容为 string
6、r FILE: 将指定的文件的内容添加至符合条件的行处
7、w FILE: 将地址指定的范围内的行另存至指定的文件中;
8、= 显示符合条件行的行号
8、s/pattern/string[&]/修饰符: 查找并替换，默认只替换每行中第一次被模式匹配到的字符串。 &: 引用模式匹配整个串修饰符 g: 全局替换 i: 忽略字符大小写

练习：

1、替换/etc/inittab文件中“id:3:initdefault:”一行中的数字为5；
[[email protected] ~]# tail /etc/inittab 
# Default runlevel. The runlevels used are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)
# 
id:3:initdefault:
[[email protected] ~]# sed  ‘[email protected]\(id:\)[0-9]\(:initdefault\)@\15\[email protected]‘ /etc/inittab
...
# Default runlevel. The runlevels used are:
#   0 - halt (Do NOT set initdefault to this)
#   1 - Single user mode
#   2 - Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 - Full multiuser mode
#   4 - unused
#   5 - X11
#   6 - reboot (Do NOT set initdefault to this)
# 
id:5:initdefault:
2、删除/etc/init.d/funcions文件中的空白行；
[[email protected] ~]# sed ‘/^$/d‘ /etc/init.d/functions 
3、删除/etc/inittab文件中位于行首的#;
[[email protected] ~]# sed ‘[email protected]^#@@g‘ /etc/inittab 
 inittab is only used by upstart for the default runlevel.

 ADDING OTHER CONFIGURATION HERE WILL HAVE NO EFFECT ON YOUR SYSTEM.

 System initialization is started by /etc/init/rcS.conf

 Individual runlevels are started by /etc/init/rc.conf

 Ctrl-Alt-Delete is handled by /etc/init/control-alt-delete.conf

 Terminal gettys are handled by /etc/init/tty.conf and /etc/init/serial.conf,
 with configuration in /etc/sysconfig/init.
 ...
4、删除/etc/rc.d/rc.sysinit文件中以#后跟至少一个空白字符开头的行的行首的#和空白字符；
[[email protected] ~]# sed ‘[email protected]^#[[:space:]]\[email protected]@g‘ /etc/rc.d/rc.sysinit 
5、删除/boot/grub/grub.conf文件中行首的空白字符；
[[email protected] ~]# sed ‘[email protected]^[[:space:]]\[email protected]@g‘ /boot/grub/grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You do not have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /, eg.
#          root (hd0,0)
#          kernel /boot/vmlinuz-version ro root=/dev/vda1 console=ttyS0 console=tty0 printk.time=1 panic=5
#          initrd /boot/initrd-[generic-]version.img
#boot=/dev/sda
...
6、取出一个文件路径的目录名称，如/etc/sysconfig/network，其目录为/etc/sysconfig，功能类似dirname命令；
[[email protected] network-scripts]# dirname /etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/network-scripts
[[email protected] network-scripts]# dirname /etc/sysconfig/network-scripts/ifcfg-eth0/
/etc/sysconfig/network-scripts
[[email protected] network-scripts]# echo /etc/sysconfig/network-scripts/ifcfg-eth0/ | sed ‘[email protected]/[^/]\{1,\}/\[email protected]@g‘
/etc/sysconfig/network-scripts
[[email protected] network-scripts]# echo /etc/sysconfig/network-scripts/ifcfg-eth0 | sed ‘[email protected]/[^/]\{1,\}/\[email protected]@g‘  
/etc/sysconfig/network-scripts

6.4、 AWK 文本格式化工具

可以实现对文件中每一行内容的每个字段分别进行格式化，而后进行显示。

awk命令的演化：awk --> new awk --> nawk gnu awk --> gawk

支持使用：变量(内置变量、自定义变量)、循环、条件、数组。

基本语法：

awk [option]... ‘script‘ FILE...
awk [option]... ‘/PATTERN/{action}‘ FILE...

-F CHAR：输入分隔符

-v：自定义变量

PATTERN（指定范围）：

地址定界：start_line, end_line

/pat1/,/pat2/

特定行： /pattern/ 仅处理能够被此处的模式匹配到的行

表达式：比较 >, >=, ==, <, <=, !=,

value ~/pattern/ value 匹配， pattern 返回真
value !~/pattern/ value 不匹配， pattern 返回真

内置变量：

NF：Number of Field 当前记录最后字段数

NR：number of record 所有记录个数

FNR:用于记录正处理的行是当前这一文件中被总共处理的行数；

ARGV: 数组，保存命令行本身这个字符串；

如：awk ‘{print $0}‘ a.txt b.txt 这个命令中，ARGV[0]保存 awk，ARGV[1]保存a.txt

ARGC: awk命令的参数的个数；
FILENAME:awk 命令所处理的文件的名称；
ENVIRON：当前 shell 环境变量及其值的关联数组；
$0:当前记录

$1-$n当前记录第n个字段

awk 分隔符常用的有四种：

字段分隔： FS（输入时默认空格， -F）

OFS(输出时分隔符默认空格)

行分隔符： RS（输入时默认是换行符）

ORS(输出时，默认换行符)

引用变量的值，不需要以$开头，所有以$开头的变量，是用于引用字段。

BEGIN 模式：在{action}开始之前执行一次，执行前的准备操作。

END 模式：在{action}结束之后执行一次，执行后的收尾操作。

布尔值：任何非0值或非空字符串都为真，反之就为假

条件表达式： selector?if-true-exp:if-false-exp

awk 的输出 print item1, item2,...

(1) 各项目之间使用逗号分隔，而输出时则使用输出分隔符分隔；
(2) 输出的各 item 可以字符串或数值、当前记录的字段、变量或 awk 的表达式；数值会被隐式转换为字符串后输出；
(3) print后面item 如果省略，相当于 print $0；输出空白，使用 pirnt ""

awk -F: ‘{ print $1, $3 }‘ /etc/passwd
root 0
bin 1
daemon 2
adm 3
lp 4
sync 5
shutdown 6
halt 7
mail 8
uucp 10
operator 11
games 12
...

awk 的 printf 命令： printf format, item1, item2,...

(1) 要指定 format；

(2) 不会自动换行；如需换行则需要给出\n

(3) format 用于为后面的每个 item 指定其输出格式；
format 格式的指示符都%开头，后跟一个字符：

%c: 显示字符的 ASCII 码； %d, %i: 十进制整数；

%f: 显示浮点数 %e, %E: 科学计数法显示数值；

%%: 显示%自身； %u: 显示无符号整数

%s: 显示字符串%g %G: 以科学计数法格式或浮点数格式显示数值；修饰符：

修饰符：

#：显示宽度 -：左对齐

+：显示数值的符号 .#:取值精度

[[email protected] network-scripts]# awk -F: ‘{printf "%-15s %i\n",$1,$3}‘ /etc/passwd
root            0
bin             1
daemon          2
adm             3
lp              4
sync            5
shutdown        6

常用的action

(1) Expressions

(2) Control statements

(3) Compound statements

(4) input statements

(5) output statements

控制语句：

if (condition) {then-body} else {[ else-body ]}
while (condition){statement1; statment2; ...}
do {statement1, statement2, ...} while (condition)
for ( variable assignment; condition; iteration process) { statement1, statement2, ...}
for (i in array) {statement1, statement2, ...} 遍历数组下标
switch(expression){case VALUE or /REGEXP/:tatement1,statement2,...default: stat1, ...}

内置函数：

split(string, array [, fieldsep [, seps ] ])	fieldsep 为分隔符，结果保存至 array 为名的数组中；下标从 0 开始的序列
length([string])	返回string字符串中字符的个数；
substr(string, start [, length])	取string字符串中的子串
system(command)	执行系统 command 并将结果返回至 awk 命令
tolower(s)	将s中的所有字母转为小写
toupper(s)	将s中的所有字母转为大写
systime()	取系统当前时间

本文出自 “随风而飘” 博客，请务必保留此出处http://yinsuifeng.blog.51cto.com/10173491/1910980

以上是关于三剑客grep/sed/awk的主要内容，如果未能解决你的问题，请参考以下文章