词频统计
Posted 089-袁佳鹏
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了词频统计相关的知识,希望对你有一定的参考价值。
f = open(\'news.txt\',\'r\') news = f.read() f.close() print(news) sep = \'\'\'.,?\'!:"\'\'\' exclude = {\'the\',\'and\',\'of\',\'to\',\'and\',\'he\',\'for\'} # 将sep里面的字符替换为空格 for c in sep: news = news.replace(c," ") wordList = news.lower().split() print(wordList) wordDict = {} wordSet = set(wordList) for w in wordSet: wordDict[w] = wordList.count(w) # print(wordDict[w],wordList.count(w)) dictList = list(wordDict.items()) dictList.sort(key=lambda x:x[1],reverse=True) a = dictList # for w in dictList: # print(w) for i in range(10): print(dictList[i]) exit() for w in wordList: wordDict[w] = wordDict.get(w,0)+1 for w in exclude: wordDict.pop(w) dictList = list(wordDict.items()) print(dictList)
以上是关于词频统计的主要内容,如果未能解决你的问题,请参考以下文章