Linux Shell编程实战---统计特定文件中单词的词频

Posted 2020-10-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Linux Shell编程实战---统计特定文件中单词的词频相关的知识，希望对你有一定的参考价值。

方法1：使用sed

Shell>cat a1.txt

123a123,555

456.333

566。555!88,thisis a good boy.

2 555

1 this

1 is

1 good

1 boy

1 a123

1 a

1 88

1 566

1 456

1 333

1 123

Shell>

sed ‘s/[[:space:]|[:punct:]]/\n/g‘

[]表示正则表达式集合，[:space:]代表空格。[:punct:]代表标点符号。

[[:space:]|[:punct:]]代表匹配空格或者标点

s/[[:space:]|[:punct:]]/\n/g代表把空格或标点替换成\n换行符

sed ‘/^$/d‘ 删除掉空行

方法2：使用awk

#!/bin/bash

filename=$1

cat$filename|awk ‘{

#getline var;

split($0,a,/[[:space:]|[:punct:]]/);

for(i in a) {

word=a[i];

b[word]++;

}

END{

printf("%-14s%s\n","Word","Count");

for(i in b) {

printf("%-14s%d\n",i,b[i])|"sort-r -n -k2";

}

‘

运行结果

[[email protected]]# cat a1.txt

123a123,555

456.333

566。555!88,thisis a good boy.

[[email protected]]# ./word_freq.sh a1.txt

Word Count

555 2

this 1

is 1

good 1

boy 1

a123 1

a 1

88 1

566 1

456 1

333 1

123 1

[[email protected]]#

方法3：使用tr

[[email protected]est01awk]# cat a1.txt

123a123,555

456.333

566i555!88,this is a good boy.

2 555

1 this

1 is

1 good

1 boy

1 a123

1 a

1 88

1 566i

1 456

1 333

1 123

[[email protected]]#

本文出自 “微小信的运维之道” 博客，请务必保留此出处http://weixiaoxin.blog.51cto.com/13270051/1963641

以上是关于Linux Shell编程实战---统计特定文件中单词的词频的主要内容，如果未能解决你的问题，请参考以下文章