中文词频统计及词云制作
Posted 06刘玲苑
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了中文词频统计及词云制作相关的知识,希望对你有一定的参考价值。
1、下载一中文长篇小说,并转换成UTF-8编码
fo=open(‘test.txt‘,‘w‘) fo.write(‘‘‘spend all your time waiting for that second chance for the break that will make it ok there‘s always some reason to feel not good enough and it‘s hard at the end of the day i need some distraction or a beautiful release memories seep from my veins let me be empty or weightless and maybe l‘ll find some peace tonight in the arms of the angel far away from here from this dark cold hotel room and the endlessness that you feel you are pulled from the wreckage of your silent reverie you are in the arms of the angel, may you find some comfort here‘‘‘) fo.close() fo=open(‘test.txt‘,‘r‘) news=fo.read() news=news.lower() for i in ‘.,"‘: news=news.replace(i,‘ ‘) word=news.split(‘ ‘) dic={} exp={‘‘,‘the‘,‘and‘,‘to‘,‘on‘,‘of‘,‘s‘,‘a‘,‘me‘,‘is‘} keys=set(word)-exp ‘‘‘print(keys)‘‘‘ for i in keys: dic[i]=word.count(i) ‘‘‘print(dic)‘‘‘ a=list(dic.items()) a.sort(key=lambda x:x[1],reverse=True) ‘‘‘print(a)‘‘‘ for i in range(10): print(a[i]) fo.close()
2、使用jieba库,进行中文词频统计,输出TOP20的词及出现次数。
import jieba txt=open(‘jianai.txt‘,‘r‘,encoding=‘utf-8‘) jianai=txt.read() for i in ‘,.""!?‘: jianai=jianai.replace(i,‘ ‘) jianai=list(jieba.cut(jianai)) ll={‘罗‘,‘简‘,‘我‘,‘你‘,‘一‘,‘都‘,‘离开‘,‘认为‘,‘这儿‘,‘即使‘,‘这样‘,‘等等‘} dic={} keys=set(jianai)-ll for i in keys: dic[i]=jianai.count(i) items=list(dic.items()) item.sort(keys=lambda x:x[1],reverse=True) for i in range(10): print(item[i]) jianai.close()
以上是关于中文词频统计及词云制作的主要内容,如果未能解决你的问题,请参考以下文章