文件方式实现完整的英文词频统计实例

Posted GT3

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了文件方式实现完整的英文词频统计实例相关的知识,希望对你有一定的参考价值。

可以下载一长篇的英文小说,进行词频的分析。

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

7.对输

s=\'\'\'One female activist called it a "great victory", while another said things would "never be the same again".
The country\'s US ambassador has described the move as "the right decision at the right time".
The Gulf kingdom is the only country in the world that bans women from driving - and women are still subject to strict dress codes and gender segregation.
Until now, only men were allowed licences and women who drove in public risked being arrested and fined.
Campaigner Sahar Nassif told the BBC: "I couldn\'t believe it. I started laughing and jumping and screaming. It\'s a great victory.
"I\'m going to buy my dream car, a convertible Mustang, and it\'s going to be black and yellow!"
Meanwhile, Latifah Alshaalan, a member of the Shura council, a government advisory panel, told broadcaster Al Arabiya: "This is a great victory for many Saudi women. This was the one file and issue which Saudi women have fought not just years, but decades for."
\'\'\'

#分解提取单词
s=s.lower()
for i in \',.\':
    s=s.replace(i,\' \')
    words=s.split(" ")        #单词的列表
   #.排除语法型词汇
exp={\'to\',\'am\',\'it\',\'you\',\'so\',\'the\',\'will\',\'I\',\'my\',\'that\'}
print(words)



#计数字典
dic={}
keys=set(words)-exp          #键的集合


for w in keys:
    dic[w]=words.count(w)    #单词技术字典


#排序
wc=list(dic.items())      #(单词,计数)元组的列表
wc.sort(key=lambda x:x[1],reverse=True)      #列表排序


#输出TOP(20)
for i in range(20):
    print(wc[i])

出结果的简要说明。

 

以上是关于文件方式实现完整的英文词频统计实例的主要内容,如果未能解决你的问题,请参考以下文章

文件方式实现完整的英文词频统计实例

文件方式实现完整的英文词频统计实例

文件方式实现完整的英文词频统计实例

文件方式实现完整的英文词频统计实例

文件方式实现完整的英文词频统计实例

文件方式实现完整的英文词频统计实例