学习笔记之shell的文本处理工具

Posted 2022-12-19 Ghost_02

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了学习笔记之shell的文本处理工具相关的知识，希望对你有一定的参考价值。

1.比较文件的不同diff，comm，cmp

首先创建两个文件。内容如下

<span style="font-size:18px;">[root@www Practice]# cat Example1
abc
def
[root@www Practice]# cat Example2
abc
def
ghi
</span>

1.diff

比较2个文件的不同之处

[root@www Practice]# diff Example[1,2]
2a3
> ghi

表示第二行后再追加一行
2.cmp 逐字节进行比较。只到遇见第一个不同的地方停下来。下面的例子是说Example1先结束了

root@www Practice]# cmp  Example1 Example2
cmp: EOF on Example1

3.comm 找出两个文件公共的部分第一列：第一个文件独有的行第二列：第二个文件独有的行第三列：两个文件公共的行

[root@www Practice]# comm Example[1,2]
		abc
		def
	ghi

2.利用diff打补丁（patch）

还是Example1 和Example2 那两个文件
先看看diff -u 的输出

[root@www Practice]# diff -u Example1..2
--- Example1	2016-11-07 15:10:21.677199505 +0800
+++ Example2	2016-11-07 15:10:42.158611035 +0800
@@ -1,2 +1,3 @@
 abc
 def
+ghi

通过patch，将Example2中不同于1的部分加给1（其实就是让1变成2）

[root@www Practice]# diff -u Example1..2 > Example.patch
[root@www Practice]# patch Example1 Example.patch 
patching file Example1
[root@www Practice]# cat Example1
abc
def
ghi

patch还有一个 -b选项生成一个orig来保存源文件

[root@www Practice]# diff -u Example1..2 > Example.patch
[root@www Practice]# patch -b Example1 Example.patch 
patching file Example1
[root@www Practice]# ls
Example1  Example1.orig  Example2  Example.patch

3.cut垂直划分文件

先做一个例子文件出来

[root@www Practice]# cp /etc/passwd .
[root@www Practice]# sed -i '5,$d' passwd
[root@www Practice]# cat passwd 
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

上面的文件有一个特点那就是每行共分为7段，段与段之间用” ：“分开。那如果我想要第一列的数据呢。

[root@www Practice]# cut -d ":" -f 1 passwd 
root
bin
daemon
adm

-d 指定分隔符。 -f指定需要哪些段 cut 是专门截取某一列的工具。如看下面的命令

[root@www Practice]# ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.191  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::20c:29ff:fec6:3cee  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:c6:3c:ee  txqueuelen 1000  (Ethernet)
        RX packets 36992  bytes 18391256 (17.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 15380  bytes 2066035 (1.9 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

如果我想要ip。就要inet后面跟的那个。别的什么都不想看。怎么办呢。策略：1.先选出第二行。2.用cut这个列处理器来选出那一列。

[root@www Practice]# ifconfig |sed -n '2p'|cut -d " " -f 10
192.168.1.191

请注意。inet前面是一堆空格。所以ip所在的位置就是第10列了。（我也是试出来的，不可能一眼看出在第几列）

4.paste：粘贴文件

paste的用法是将行提取出来，按成列粘在一起（我说的有些难懂。看例子就好懂了）这里有一个电话本。但布局有些不好看

[root@www Practice]# cat addressbook 
mage
www.magedu.com 
123-456-6789
鸟哥
www.vbird.org
245-355-8876

为了把上面的电话薄改成好看的模式。用paste

[root@www Practice]# paste -s addressbook 
mage	www.magedu.com 	123-456-6789	鸟哥	www.vbird.org	245-355-8876

-s将这些行都放在了一行里
再加上-d。将这些行之间的分隔符指定一下注意若指定为\\n就可以换行了

[root@www Practice]# paste -s -d"::\\n" addressbook 
mage:www.magedu.com :123-456-6789
鸟哥:www.vbird.org:245-355-8876

这下好看了吧

5.sort：对文件进行排序

先制作了一个列表

<span style="font-size:18px;">[root@www Practice]# cat shortlist 
mage 2 video 
vbird 1 book
lige 4 westos
chaoge 3 imooc
</span>

与cut类似。sort也识别字段的。我可以以第二列的数字为顺序列出上述文件

[root@www Practice]# sort -t " " -k 2 -n shortlist 
vbird 1 book
mage 2 video 
chaoge 3 imooc
lige 4 westos

上图 -t 指出分隔符。-k指出第几列。-n数值排序。
sort不仅可以排序还可以查看是否文件有序。源文件的第二列是无序的吧。那用sort看看，究竟有序没

[root@www Practice]# sort -t " " -k 2 -n -c  shortlist 
sort: shortlist:2: disorder: vbird 1 book

上图-c查看是否有序。可以看到，第二行开始乱序（disorder）了
另外还有 -u 删除重复行 -f 不区分大小写 -r 反序 -o 指定多个文件。在一块排序

6.uniq：定位重复行和非重复行

这个命令用前先sort排序后才能使用其实这个命令很简单，就3个选项很较为常用。-c（计算出现平率）。-u（选择非重复行）。-d（选择重复行）
演示前先制作一个例子文件

[root@www Practice]# cat example 
aaaa
bbbb
aaaa
cccc
dddd
eeee
dddd

计算出现频率

[root@www Practice]# sort example |uniq -c
      2 aaaa
      1 bbbb
      1 cccc
      2 dddd
      1 eeee

显示重复行

[root@www Practice]# sort example |uniq -d
aaaa
dddd

显示非重复行

[root@www Practice]# sort example |uniq -u
bbbb
cccc
eeee

7.tr:变换字符

注意：tr只能从标准输入中获取内容。不能以文件名。
还记的前面的那个电话薄文件吗。我现在想将其中的分隔符”：“改一下。改成其他的，怎么改

[root@www Practice]# paste -s -d"::\\n" addressbook 
mage:www.magedu.com

好了，使用tr命令。将其中的：变成3个空格。

[root@www Practice]# paste -s -d"::\\n" addressbook |tr ':' ' '
mage www.magedu.com 123-456-6789
鸟哥 www.vbird.org 245-355-8876

或者将小写英文字母换成大写的。

[root@www Practice]# paste -s -d"::\\n" addressbook |tr '[a-z]' '[A-Z]'
MAGE:WWW.MAGEDU.COM:123-456-6789
鸟哥:WWW.VBIRD.ORG:245-355-8876

tr还可以删除某个字符-d 压缩一连串的字符-s
使用-d 删掉了换行符

[root@www Practice]# paste -s -d"::\\n" addressbook |tr -d '\\n'
mage:www.magedu.com:123-456-6789鸟哥:www.vbird.org:245-355-8876[root@www Practice]#

看。这个输出前面有好多空格，记得吧。导致cut列截取时需要-f指定第10列才能指定192.168.1.191

[root@www Practice]# ifconfig |sed -n '2p'
        inet 192.168.1.191  netmask 255.255.255.0  broadcast 192.168.1.255

那用-s 选项将空格压缩

[root@www Practice]# ifconfig |sed -n '2p'|tr -s ' '
 inet 192.168.1.191 netmask 255.255.255.0 broadcast 192.168.1.255

空格就剩1个了，这下cut截取时就好看了

[root@www Practice]# ifconfig |sed -n '2p'|tr -s ' '|cut -d " " -f 3
192.168.1.191

8.sed：流编辑器

注意：sed不对源文件操作。若想对源文件操作使用-i 其实就是个行截取工具。 p：显示某行。 d：删除某行。 s：替换（和vim一样） sed有个小问题，就是截取用户想要的行后会将全文再打印一遍，用-n就只显示用户想要的那一行了

这次的例子文件就拿passwd吧（我加了行号。方便观察）

[root@www Practice]# cat passwd 
1.root:x:0:0:root:/root:/bin/bash
2.bin:x:1:1:bin:/bin:/sbin/nologin
3.daemon:x:2:2:daemon:/sbin:/sbin/nologin
4.adm:x:3:4:adm:/var/adm:/sbin/nologin

只看文件的第2行

[root@www Practice]# sed -n '2p' passwd 
2.bin:x:1:1:bin:/bin:/sbin/nologin
[root@www Practice]#

删除文件第二行，并显示

[root@www Practice]# sed  '2d' passwd 
1.root:x:0:0:root:/root:/bin/bash
3.daemon:x:2:2:daemon:/sbin:/sbin/nologin
4.adm:x:3:4:adm:/var/adm:/sbin/nologin

打印文件的2到3行

[root@www Practice]# sed -n '2,3p' passwd 
2.bin:x:1:1:bin:/bin:/sbin/nologin
3.daemon:x:2:2:daemon:/sbin:/sbin/nologin

比如我想看看。/usr/bin下有suid命令吗

<span style="font-size:18px;">[root@www Practice]# ll /usr/bin |sed -n '/^...s/p'
-rwsr-xr-x. 1 root root      52936 Nov 20  2015 at
-rwsr-xr-x. 1 root root      64200 Mar  6  2015 chage
-rws--x--x. 1 root root      23960 Aug  3 01:12 chfn
-rws--x--x. 1 root root      23856 Aug  3 01:12 chsh
-rwsr-xr-x. 1 root root      57552 Mar 31  2016 crontab
-rwsr-xr-x. 1 root root      32584 Nov 20  2015 fusermount
-rwsr-xr-x. 1 root root      78168 Mar  6  2015 gpasswd
-rwsr-xr-x. 1 root root      61304 Mar 31  2016 ksu
-rwsr-xr-x. 1 root root      44232 Aug  3 01:12 mount
-rwsr-xr-x. 1 root root      41752 Mar  6  2015 newgrp
-rwsr-xr-x. 1 root root      27832 Jun 10  2014 passwd
-rwsr-xr-x. 1 root root      27656 Jun 24 02:13 pkexec
---s--x---. 1 root stapusr  186792 Nov 22  2015 staprun
-rwsr-xr-x. 1 root root      32072 Aug  3 01:12 su
---s--x--x. 1 root root     130720 Apr  1  2016 sudo
-rwsr-xr-x. 1 root root      31960 Aug  3 01:12 umount
-rwsr-xr-x. 1 root root    2397160 Nov 20  2015 Xorg</span>

比如我想看看某c代码中的结构体都定义了哪些

[root@www Practice]# sed -n '/^struct.*/,//p' arp_lay.c 
struct _ethhdr
        unsigned char dsteth[6];
        unsigned char srceth[6];
        unsigned short type;
;
struct _arphdr
        unsigned short hdtype;
        unsigned short protype;
        unsigned char hdaddrlength;
        unsigned char proaddrlength;
        unsigned short op;
        unsigned char srcaddr[6];
        unsigned char srcip[4];
        unsigned char dstaddr[6];
        unsigned char dstip[4];
;

将电话薄里的电话号码遮盖

[root@www Practice]# sed 's/[0-9]/*/g' addressbook 
mage
www.magedu.com
***-***-****
鸟哥
www.vbird.org
***-***-****

9.awk：强大的文本处理工具和报表制作工具

awk 的标准形式。
awk -F： ’$3>200 print $1,$3‘ /etc/passwd 分隔符条件打印

1.用awk来代替sed。来行选择

找suid命令

[root@www Practice]# ll /usr/bin |awk '/^...s/ print'
-rwsr-xr-x. 1 root root      52936 Nov 20  2015 at
-rwsr-xr-x. 1 root root      64200 Mar  6  2015 chage
-rws--x--x. 1 root root      23960 Aug  3 01:12 chfn
-rws--x--x. 1 root root      23856 Aug  3 01:12 chsh
-rwsr-xr-x. 1 root root      57552 Mar 31  2016 crontab
-rwsr-xr-x. 1 root root      32584 Nov 20  2015 fusermount
-rwsr-xr-x. 1 root root      78168 Mar  6  2015 gpasswd
-rwsr-xr-x. 1 root root      61304 Mar 31  2016 ksu
-rwsr-xr-x. 1 root root      44232 Aug  3 01:12 mount
-rwsr-xr-x. 1 root root      41752 Mar  6  2015 newgrp
-rwsr-xr-x. 1 root root      27832 Jun 10  2014 passwd
-rwsr-xr-x. 1 root root      27656 Jun 24 02:13 pkexec
---s--x---. 1 root stapusr  186792 Nov 22  2015 staprun
-rwsr-xr-x. 1 root root      32072 Aug  3 01:12 su
---s--x--x. 1 root root     130720 Apr  1  2016 sudo
-rwsr-xr-x. 1 root root      31960 Aug  3 01:12 umount
-rwsr-xr-x. 1 root root    2397160 Nov 20  2015 Xorg

或者再加层过滤，只显示是哪些命令

[root@www Practice]# ll /usr/bin |awk '/^...s/ print $9'
at
chage
chfn
chsh
crontab
fusermount
gpasswd
ksu
mount
newgrp
passwd
pkexec
staprun
su
sudo
umount
Xorg

2.在条件判断的时候还可以判断大小。

查看uid 小于5的用户

[root@www Practice]# awk -F: '$3<5 print ' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

3.awk可以用于计算（相当的强大了）

[root@www Practice]# echo 1 3|awk -F" " 'print $1/$2'
0.333333

[root@www Practice]# echo 22 7|awk -F" " 'printf "%1.20f\\n",$1/$2'
3.14285714285714279370

4.字符串的比较

[root@www Practice]# awk -F: '$7 ~ /\\/bin\\/bash/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
wjx:x:1000:1000:Xiao_admin:/home/wjx:/bin/bash
liu:x:1002:1003:partner:/home/liuy:/bin/bash

显示第七字段是/bin/bash的行

5.逻辑运算比较

与运算&&

[root@www Practice]# awk -F: '$7 ~ /\\/bin\\/bash/ && $3 > 500' /etc/passwd
wjx:x:1000:1000:XiaoMi_admin:/home/wjx:/bin/bash
liuy:x:1002:1003:partner:/home/liuy:/bin/bash

也可以使用括号

[root@www Practice]# awk -F: '($3 < 10) && ($7 ~ /\\/bin\\/bash/) print $1'  /etc/passwd
root

6.awk中还有一些内置的变量和函数

NR：行数 FS：分隔符 OFS：输出分隔符 NF：当前字段数
还有一些内置函数。（太强大了）
1.length（）返回整行的长度 2.system（”cmd“）运行UNIX的命令 3.split（）分割 4.substr（）提取

还有if和for语句，while循环和关联数组。。。。
下面演示一下如何用awk计算。当前用户下有哪些类型的用户，各有多少个

[root@www Practice]# awk -F: 'BEGINprint  count[$7]++
END for (desig in count)
printf "%-20s %4d \\n",desig,count[desig] ' passwd

/bin/sync               1 
/bin/bash               3 
/sbin/nologin          41 
/sbin/halt              1 
/sbin/shutdown          1

就说厉害不厉害吧
下面写一个脚本。自动添加linuxuser文件中的用户，并将linuxpasswd文件中的密码赋给新添加用户先创建linuxuser文件

再创建一个linuxpasswd文件

好了，开始脚本

#!/bin/bash
#############################################################
# File Name:adduser.sh
# Author:Nicolas Cage
# mail:xxxxxxx
# Created Time: Mon 07 Nov 2016 07:19:44 PM CST
#====================================================
#

passwddir="/root/Practice/linuxpasswd"
passwdcount=`cat $passwddir|wc -l`
count=0
while read user;do
        count=$(($count+1))
        if [[ $count -gt $passwdcount ]];then
                echo -e "-Error: no adequate passwd to add user"
                exit 1
        fi  
        passwd=`sed -n /$count/p $passwddir`
                adduser $user
                echo $passwd|passwd --stdin $user
done < linuxuser

执行结果如下

[root@www Practice]# bash  adduser.sh 
adduser: user 'user1' already exists
Changing password for user user1.
passwd: all authentication tokens updated successfully.
adduser: user 'user2' already exists
Changing password for user user2.
passwd: all authentication tokens updated successfully.
adduser: user 'user3' already exists
Changing password for user user3.
passwd: all authentication tokens updated successfully.
adduser: user 'user4' already exists
Changing password for user user4.
passwd: all authentication tokens updated successfully.
-Error: no adequate passwd to add user

再来个自动分区的脚本！！（不用怀疑，这真的是脚本）

!/bin/bash
#############################################################
# File Name:partdisk.sh
# Author:Nicolas Cage
# mail:xxxx
# Created Time: Mon 07 Nov 2016 07:50:53 PM CST
#====================================================
#

fdisk /dev/$1<<end
n



+$2G
n



+$2G
n



+$2G
n




wq
end

执行 partdisk.sh sdb 2

Command (m for help): p

Disk /dev/sdb: 8589 MB, 8589934592 bytes, 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x9ae086b9

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048     4196351     2097152   83  Linux
/dev/sdb2         4196352     8390655     2097152   83  Linux
/dev/sdb3         8390656    12584959     2097152   83  Linux
/dev/sdb4        12584960    16777215     2096128    5  Extended

看，分好了

以上是关于学习笔记之shell的文本处理工具的主要内容，如果未能解决你的问题，请参考以下文章