文件方式实现完整的英文词频统计实例
Posted 塨槟
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了文件方式实现完整的英文词频统计实例相关的知识,希望对你有一定的参考价值。
1.读入待分析的字符串
str=\'\'\'We don\'t talk anymore We don\'t talk anymore We don\'t talk anymore Like we used to do We don\'t laugh anymore What was all of it for? We don\'t talk anymore Like we used to do I just heard you found the one you\'ve been lookin\' The one you been looking for I wish i would\'ve konwn that wasn\'t me Cause even after all this time i still wonder Why i can\'t move on? Just the way you dance so easliy Don\'t wanna know The kinda dress you\'re wearin\' tonight If he\'s holdin\' onto you so tight The way i did before I overdosed Should\'ve known your love was game Now I can\'t get\'cha out of my brain Ooh it\'s such a shame We don\'t talk anymore We don\'t talk anymore We don\'t talk anymore Like we used to do We don\'t laugh anymore What was all of it for? We don\'t talk anymore Like we used to do I just hope you\'r lyin\' next to somebody Know it\'s hard to love ya like me Must be a good reason that you\'re gone Every now and then I think you might want me to come show up your door But I\'m just too afraid that i\'ll be worng Don\'t wanna know If you\'ra lookin\' into her eyes If she\'s holdin\' onto you so tight The way i did before I overdosed Should\'ve know your love was a game Now I can\'t get\'cha out of my brain Ooh it\'s such a shame We don\'t talk anymore We don\'t talk anymore We don\'t talk anymore Like we used to do We don\'t laugh anymore What was all of it for? We don\'t talk anymore Like we used to do Like we used to do Don\'t wanna know The kinda dress you\'re wearin\' tonight If he\'s givin\' it to you just right The way i did before I overdosed Should\'ve know your love was a game Now I can\'t get\'cha out of my brain Ooh it\'s such a shame We don\'t talk anymore We don\'t talk anymore We don\'t talk anymore Like we used to do We don\'t laugh anymore What was all of it for? We don\'t talk anymore Like we used to do We don\'t talk anymore The way did before We don\'t talk anymore Ooh Woo Ooh it\'s such a shame We don\'t talk anymore\'\'\'
2.分解提取单词
3.计数字典
4.排除语法型词汇
5.排序
6.输出TOP(20)
fo=open(\'1.txt\',\'r\') str=fo.read() str=str.lower() #转换为小写 for i in \',.?\': str=str.replace(i,\' \') #用空格代替标点符号 words=str.split(\' \') #分解提取单词 exc={\'to\',\'a\',\'of\',\'it\',} #选择高频且无效的关键词 dic={} keys=set(words) #出现过的单词的集合 keys=keys-exc print(words)#排除语法型词汇 for i in keys: dic[i]=words.count(i) #计数字典 print(dic) wc=list(dic.items()) #列表 wc.sort(key=lambda x:x[1],reverse=True)#排序 print(wc) for i in range(20): #输出TOP(20) print(wc[i])
运行结果:
以上是关于文件方式实现完整的英文词频统计实例的主要内容,如果未能解决你的问题,请参考以下文章