综合练习：词频统计

Posted 2020-10-28 cgq丶虾

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了综合练习：词频统计相关的知识，希望对你有一定的参考价值。

f = open(\'test.txt\',\'r\')
news = f.read()
f.close()

sep = \'\'\'.,\'?!:"\'\'\'
exclude = {\'the\',\'and\',\'to\',\'a\',\'of\',\'was\',\'on\',\'with\',\'i\',\'s\',\'is\',\'were\',\'that\',\'back\',\'at\',\'little\',\'have\'}
for w in sep:
    news = news.replace(w,\' \')

wordList = news.lower().split()
wordDict = {}
\'\'\'
for v in wordList:
    wordDict[v] = wordDict.get(v, 0)+1
for v in exclude
    del( wordDict[v])
\'\'\'

wordset = set(wordList) - exclude
for v in wordset:
    wordDict[v] = wordList.count(v)

dictList = list(wordDict.items())
dictList.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    print(dictList[i])

f = open(\'newscount.txt\',\'a\')
for i in range(25):
    f.write(dictList[i][0]+\' \'+str(dictList[i][1])+\'\\n\')

运行结果：

中文统计

import jieba

f = open(\'text.txt\', \'r\', encoding = \'utf-8\')
news = f.read()
f.close()

sep=\'\'\'，。‘’“”：；（）！？、《》 \'\'\'
exclude={\'我\', \'在\', \'不\', \'一\', \'了\', \'那\', \'是\', \'来\', \'他\', \'个\', \'行\', \'你\', \'的\',
     \'者\',\'有\',\'\\n\',\'-\',\'出\',\'这\',\'时\',\'没\',\'她\',\'到\',\'上\',\'们\',\'会\',\'着\',\'说\',\'要\'
    , \'为\',\'过\',\'看\',\'得\',\'里\',\'克\',\'去\',\'想\',\'好\',\'天\',\'小\',\'后\',\'地\',\'么\',\'都\'
    , \'还\',\'以\',\'对\',\'能\',\'大\',\'也\',\'很\',\'而\',\'然\',\'下\',\'但\',\'吕\',\'把\',\'开\',\'从\'
    , \'让\',\'就\',\'一个\',\'可\',\'点\',\'跟\',\'样\',\'向\',\'事\',\'起\',\'中\',\'面\'}

for c in sep:
    news = news.replace(c,\' \')
wordList=list(jieba.cut(news))
wordDict={}
words=list(set(wordList)-exclude)

for w in range(0,len(words)):
    wordDict[words[w]]=news.count(str(words[w]))

dictList = list(wordDict.items())
dictList.sort(key=lambda x:x[1],reverse=True)

f = open(\'new.txt\', \'a\',encoding="utf-8")
for i in range(20):
    f.write(dictList[i][0] + \':\' + str(dictList[i][1]) + \'\\n\')
f.close()

　结果：

以上是关于综合练习：词频统计的主要内容，如果未能解决你的问题，请参考以下文章