shell三剑客之awk

Posted 喝茶等下班

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了shell三剑客之awk相关的知识,希望对你有一定的参考价值。

1.

选项  描述

-v var=value 变量赋值

--posix  兼容POSIX正则表达式

--dump-variables=[file] 把awk命令时的全局变量写入文件,默认文件是awkvars.out

--profile=[file] 格式化awk语句到文件,默认是awkprof.out  


常用的模式有:

Pattern   Description

BEGIN   给程序赋予初始状态,先执行的工作

END   程序结束之后执行的一些扫尾工作  

/regular expression/  为每个输入记录匹配正则表达式

pattern && pattern  逻辑and,满足两个模式

pattern || pattern  逻辑or,满足其中一个模式

!pattern   逻辑not,不满足模式

pattern1,pattern2  范围模式,匹配所有模式1的记录,直到匹配到模式2

而运作呢,就是下面所讲的print、流程控制、I/O语句等。

2.

指定多个分隔符

[root@study ~]# tail -3 /etc/services |awk -F[/#] print $3
iqobject
iqobject
Matahari Broker

指定多个空格或者/为分隔符

[root@study ~]# tail -3 /etc/services |awk -F[ /]+ print $3
tcp
udp
tcp

打印变量

[root@study ~]# a=123
#打印变量a
[root@study ~]# awk -v a=$a BEGINprint a
123
#打印变量a
[root@study ~]# awk BEGINprint $a
123

输出awk全局变量到文件

[root@study ~]# seq 5|awk --dump-variables print $0
1
2
3
4
5
[root@study ~]# cat awkvars.out
ARGC: 1
ARGIND: 0
ARGV: array, 1 elements
BINMODE: 0
CONVFMT: "%.6g"
ERRNO: ""
FIELDWIDTHS: ""
FILENAME: "-"
FNR: 5
FPAT: "[^[:space:]]+"
FS: " "
IGNORECASE: 0
LINT: 0
NF: 1
NR: 5
OFMT: "%.6g"
OFS: " "
ORS: "\\n"
RLENGTH: 0
RS: "\\n"
RSTART: 0
RT: "\\n"
SUBSEP: "\\034"
TEXTDOMAIN: "messages"

3.

BEGIN模式和END

BEGIN模式是在处理文件之前执行该操作,常用于修改内置变量、变量赋值和打印输出的页眉或标题。

打印页眉

[root@study ~]# tail /etc/services |awk BEGINprint "Service\\t\\tPort\\t\\t\\tDescription\\n====="print $0
Service Port Description
=====
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker

END模式是在程序处理完才会执行。

打印页尾

[root@study ~]# tail /etc/services |awk BEGINprint "Service\\t\\tPort\\t\\t\\tDescription\\n====="print $0
Service Port Description
=====
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
[root@study ~]# tail /etc/services |awk print $0ENDprint "========\\nEND......"
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
========
END......

格式化输出awk命令到文件

[root@study ~]# tail /etc/services |awk --profile BEGINprint "Service\\t\\tPort\\t\\t\\tDescription\\n====="print $0ENDprint "========\\nEND......"
Service Port Description
=====
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
========
END......
[root@study ~]# cat awkprof.out
# gawk profile, created Mon Feb 7 11:29:22 2022

# BEGIN block(s)

BEGIN
print "Service\\t\\tPort\\t\\t\\tDescription\\n====="


# Rule(s)


print $0


# END block(s)

END
print "========\\nEND......"

4.

正则匹配

#匹配包含tcp的行的第1列
[root@study ~]# tail /etc/services |awk /tcp/print $1
3gpp-cbsp
isnetserv
blp5
com-bardac-dw
iqobject
matahari
#匹配blp5开头的行
[root@study ~]# tail /etc/services |awk /^blp5/print $0
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
#匹配开头字符长度为8的行,注意8后面要跟个空格,否则会结果不正确
[root@study ~]# tail /etc/services |awk /^[0-9a-z]8 /print $0
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker


5.

逻辑 and 、or 和not

#匹配同时包含blp5和tcp的行
[root@study ~]# tail /etc/services |awk /blp5/ && /tcp/print $0
blp5 48129/tcp # Bloomberg locator
#不匹配开头#或空行的行
#方法1
[root@study ~]# awk !/^#/ && !/^$/ print $1 /etc/httpd/conf/httpd.conf
#方法2
[root@study ~]# awk !/^#|^$/ print $0 /etc/httpd/conf/httpd.conf
#方法3
[root@study ~]# awk /^[^#]|"^$"/ /etc/httpd/conf/httpd.conf
#方法3,貌似有点问题。不知道后面的|"^$"是干啥用的,下面的命令执行结果跟方法3是一样的
[root@study ~]# awk /^[^#]/ /etc/httpd/conf/httpd.conf
#做了实验对比输出结果,证明是一样的
[root@study ~]# awk /^[^#]/ /etc/httpd/conf/httpd.conf > awk1.txt
[root@study ~]# awk /^[^#]|"^$"/ /etc/httpd/conf/httpd.conf > awk2.txt
[root@study ~]# diff awk1.txt awk2.txt
[root@study ~]#

6.

范围匹配

[root@study ~]# tail /etc/services
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
[root@study ~]# tail /etc/services|awk /^3/,/^com/
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw

对匹配范围后记录再次处理,例如匹配关键字下一行到最后一行

[root@study ~]# seq 5|awk /3/,/^$/printf /3/?"":$0"\\n"
4
5

另一种判断真假的方式来实现

[root@study ~]# seq 5|awk /3/t=1;nextt
4
5

1和2都不匹配3,不执行后面,执行t,t变量还没赋值,为空,空在awk中就为假,就不打印当前行。匹配到3,执行t=1,next跳出,不执行t。4也不匹配3,执行t,t的值上次赋值的1,为真,打印当前行,以此类推。(非0的数字都为真,所以t可以写任意非0数字)

如果想打印匹配行到最后一行,就可以这样写

[root@study ~]# seq 5|awk /3/t=1t
3
4
5

7.

内置变量

FS 输入字段分隔符,默认是空格或制表符

OFS 输出字段分隔符,默认是空格

RS 输入记录分隔符,默认是换行符\\n

ORS 输出记录分隔符,默认是护身符\\n

NF 统计当前记录中字段个数

NR 统计记录编号,每处理一行,编号就会+1

FNR 同上,与NR不同的是处理第二个文件时,编号会重新计数

ARGC 命令行参数数量

ARGV 命令行参数数组序列数组,下标从0开始,ARGV[0]是awk

ARGIND 当前正在处理的文件索引值。第一个文件是1,第二个是2,以此类推

ENVIRON 当前系统的环境变量

FILENAME输出当前处理的文件名

IGNORECASE忽略大小写

SUBSEP 数组中下标的分隔符,默认为"\\034"


在程序开始前重新赋值FS变量,改变默认分隔符为冒号,与-F一样

[root@study ~]# awk BEGINFS=":"print $1,"\\t",$2 /etc/passwd |head -5
root x
bin x
daemon x
adm x
lp x

也可以用-v重置这个变量

[root@study ~]# awk -vFS=: print $1,"\\t",$2 /etc/passwd |head -5
root x
bin x
daemon x
adm x
lp x

也可以字符串拼接实现分隔

[root@study ~]# awk -vFS=: print $1"#"$2 /etc/passwd |head -5
root#x
bin#x
daemon#x
adm#x
lp#x

以指定字符作为行分隔符来处理记录

[root@study ~]# echo "www.baidu.com/user/test.html" | awk BEGINRS="/"print $0
www.baidu.com
user
test.html

RS也支持正则

[root@study ~]# seq -f "str%02g" 10|sed n;n;a----|awk BEGINRS="-+"print $1
str01
str04
str07
str10

替换某个字符

[root@study ~]# tail -2 /etc/services |awk BEGINRS="/";ORS="#"print $0
iqobject 48619#udp # iqobject
matahari 49000#tcp # Matahari Broker

NF是字段个数

[root@study ~]# echo a b c d e f|awk print NF
6
[root@study ~]# echo a b c d e f|awk print $NF
f
[root@study ~]# echo a b c d e f|awk print $(NF-1)
e
#排除倒数第一、二列
[root@study ~]# echo a b c d e f|awk $NF="";$(NF-1)="";print $0
a b c d
#排除第一列
[root@study ~]# echo a b c d e f|awk $1="";print $0
b c d e f

打印行数

[root@study ~]# tail -5 /etc/services |awk print NR,$0
1 com-bardac-dw 48556/tcp # com-bardac-dw
2 com-bardac-dw 48556/udp # com-bardac-dw
3 iqobject 48619/tcp # iqobject
4 iqobject 48619/udp # iqobject
5 matahari 49000/tcp # Matahari Broker
#打印总行数
[root@study ~]# tail -5 /etc/services |awk ENDprint NR
5
#打印第三行
[root@study ~]# tail -5 /etc/services |awk NR==3
iqobject 48619/tcp # iqobject
#打印第三行第2列
[root@study ~]# tail -5 /etc/services |awk NR==3print $2
48619/tcp
#打印前三行
[root@study ~]# tail -5 /etc/services |awk NR<=3print NR,$0
1 com-bardac-dw 48556/tcp # com-bardac-dw
2 com-bardac-dw 48556/udp # com-bardac-dw
3 iqobject 48619/tcp # iqobject

shell三剑客之awk_数组

8.

ARGC和ARGV

ARGC是命令行参数数量

ARGV是将命令行参数存到数组,元素由ARGC指定,数组下标从0开始

[root@study ~]# awk BEGINprint ARGC 1 2 3
4
[root@study ~]# awk BEGINprint ARGV[0] 1 2 3
awk
[root@study ~]# awk BEGINprint ARGV[1] 1 2 3
1

ARGIND是当前正在处理的文件索引值,第一个文件是1,第二个文件是2,以此类推,从而可以通过这种方式判断正在处理哪个文件。

[root@study ~]# cat a b
a
b
c
c
d
e
[root@study ~]# awk print ARGIND,$0
a
0 a
^[[A^C
[root@study ~]# awk print ARGIND,$0 a b
1 a
1 b
1 c
2 c
2 d
2 e
[root@study ~]# awk ARGIND==1print "a->"$0;ARGIND==2print "b->"$0 a b
a->a
a->b
a->c
b->c
b->d
b->e

9.

ENVIRON调用系统变量,如果是设置的环境变量,还需要用export导入到系统变量才可以调用

[root@study ~]# awk BEGINprint ENVIRON["HOME"],ENVIRON["USER"]
/root root
[root@study ~]# echo $a
123
[root@study ~]# awk BEGINprint ENVIRON["a"]

[root@study ~]# export a
[root@study ~]# awk BEGINprint ENVIRON["a"]
123

10.

FILENAME是当前处理文件的文件名

[root@study ~]# awk FNR==NRprint FILENAME".txt\\t"$0FNR!=NRprint FILENAME".txt\\t"$0 a b
a.txt a
a.txt b
a.txt c
b.txt c
b.txt d
b.txt e

忽略大小写,等于1表示忽略大小写

[root@study ~]# echo a A b BA |xargs -n1|awk BEGINIGNORECASE=1/A/
a
A
BA

11.

操作符

(....)  分组

$  字段引用

++ --  递增和递减

+- !  加号,减号,逻辑否定

* / %  

+ -

|  |&  管道,用于getline,print和printf

< > <= >= != ==

~  !~  正则表达式匹配,否定正则表达式匹配

in  数组成员

&&   ||

?:

= += -= *= /= %= ^=变量赋值运算符

在awk中,有3种情况表达式为假:数字是0,空字符串和未定义的值。

数值运算,未定义变量初始值为0。字符运算,未定义变量初始值为空。

截取整数

[root@study ~]# echo "123abc abc456 234abd546"|xargs -n1|awk print +$0
123
0
234
[root@study ~]# echo "123abc abc456 234abd546"|xargs -n1|awk print -$0
-123
0
-234

打印奇数行、偶数行

[root@study ~]# seq 6|awk i=!i
1
3
5
[root@study ~]# seq 6|awk !(i=!i)
2
4
6

shell三剑客之awk_分隔符_02

管道符的使用

[root@study ~]# echo 1 3 2 5 6 4|xargs -n1|awk print $0|"sort"
1
2
3
4
5
6

正则表达式匹配

[root@study ~]# seq 5|awk $0~3print $0
3
[root@study ~]# seq 5|awk $0!~3print $0
1
2
4
5
[root@study ~]# seq 5|awk $0!~/[34]/print $0
1
2
5
[root@study ~]# seq 5|awk $0~/[34]/print $0
3
4
[root@study ~]# seq 5|awk $0~/[^34]/print $0
1
2
5

判断数组成员

[root@study ~]# awk BEGINa["a"]=123ENDif("a" in a)print "yes" < /dev/null
yes

三目运算作为一个表达式,里面不允许写print

[root@study ~]# awk BEGINprint 1==1?"yes":"no"
yes

替换换行符为逗号

[root@study ~]# seq 5|awk print n=n?n","$0:$0
1
1,2
1,2,3
1,2,3,4
1,2,3,4,5
[root@study ~]# seq 5|awk n=n?n","$0:$0ENDprint n
1,2,3,4,5

每三行后添加新行

[root@study ~]# seq 10|awk print NR%3?$0:$0"\\n"
1
2
3

4
5
6

7
8
9

10

两行合并到一行

[root@study ~]# seq 6|awk printf NR%2!=0?$0" ":$0"\\n"
1 2
3 4
5 6
[root@study ~]# seq 6|awk printf NR%2!=0?$0" ":$0"\\n"
1 2
3 4
5 6
[root@study ~]# seq 6|awk ORS=NR%2?" ":"\\n"
1 2
3 4
5 6
[root@study ~]# seq 6|awk if(NR%2)ORS=" ";else ORS="\\n";print
1 2
3 4
5 6

变量赋值

字段求和
[root@study ~]# seq 5|awk sum+=1ENDprint sum
5
[root@study ~]# seq 5|awk sum+=$0ENDprint sum
15

12.

流程控制

也支持正则匹配判断,一般在写复杂语句时使用

[root@study ~]# echo "123abc#456cde 789aaa#aaabbb "|xargs -n1|awk -F# if($2~/[0-9]/)print $2
456cde
[root@study ~]# echo "123abc#456cde 789aaa#aaabbb "|xargs -n1|awk -F# if($2!~/[0-9]/)print $2
aaabbb
[root@study ~]# echo "123abc#456cde 789aaa#aaabbb "|xargs -n1|awk -F# $2!~/[0-9]/print $2
aaabbb

多分支

[root@study ~]# awk if($1==4)print "1" else if($2==5)print "2"else if($3==6)print "3"else print "no" file1
no
1
no

while语句

[root@study ~]# awk i=1;while(i<=NF)print $i;i++ file1
1
2
3
4
5
6
7
8
9

倒叙打印

[root@study ~]# awk for(i=NF;i>=1;i--)print $i file1
3
2
1
6
5
4
9
8
7
#都换行了,这并不是我们要的结果。怎么改进呢?
[root@study ~]# awk for(i=NF;i>=1;i--)printf $i" ";print "" file1
3 2 1
6 5 4
9 8 7

排除第一行、倒数第一行:

[root@study ~]# awk for(i=2;i<=NF;i++)printf $i" ";print "" file1
2 3
5 6
8 9
[root@study ~]# awk for(i=1;i<=NF-1;i++)printf $i" ";print "" file1
1 2
4 5
7 8

IP加单引号

[root@study ~]# echo 10.10.10.1 10.10.10.2 10.10.10.3|awk for(i=1;i<=NF;i++)printf "\\047"$i"\\047"
10.10.10.110.10.10.210.10.10.3

\\047是ASCII码,可以通过showkey -a命令查看

for循环遍历数组

[root@study ~]# seq -f "str%.g" 5|awk a[NR]=$0ENDfor(v in a)print v,a[v]
4 str4
5 str5
1 str1
2 str2
3 str3

删除数组和元素

[root@study ~]# seq -f "str%.g" 5|awk a[NR]=$0ENDdelete a;for(v in a)print v,a[v]
#空的
[root@study ~]# seq -f "str%.g" 5|awk a[NR]=$0ENDdelete a[3];for(v in a)print v,a[v]
4 str4
5 str5
1 str1
2 str2

exit 退出程序,与shell的exit一样。退出值是0-255之间的数字。

[root@study ~]# seq 5|awk if($0~/3/)exit 1
[root@study ~]# echo $?
1
[root@study ~]# seq 5|awk if($0~/3/)exit(123)
[root@study ~]# echo $?
123

13.

数组

未完待续……



以上是关于shell三剑客之awk的主要内容,如果未能解决你的问题,请参考以下文章

shell三剑客之awk

linux12shell编程 --> 三剑客之awk命令

Shell编程之正则表达式三剑客——awk工具

shell脚本江湖秘籍只传有缘人——流编辑器“三剑客”之awk命令

shell三剑客之awk详解

shell三剑客之awk 报告生成器