python统计word文档中的词频

Posted 2020-11-26

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python统计word文档中的词频相关的知识，希望对你有一定的参考价值。

如何将统计word文档中的词频呢？先用docx模块将word文档转变成txt格式，然后使用jieba模块进行分词，并统计词频。是不是很简单～

#2020年3月10日
#Elizabeth
from docx import Document
import jieba #分词模块

#自定义函数，将word文档写入txt文档
def to_txt(path):
    document=Document(path)
    txt=open(‘/Users/fangluping/Desktop/数据分析笔试试题/词频统计.txt‘,‘w+‘)
    for paragraph in document.paragraphs:
        text=paragraph.text 
        txt.write(text)
    txt.close()
    return txt

if __name__==‘__main__‘:
    path0=‘/Users/fangluping/Desktop/数据分析笔试试题/笔试题目-V1.0.docx‘
    to_txt(path0) #调用写入txt文档的函数

    #分词
    txt=open(‘/Users/fangluping/Desktop/词频统计.txt‘,‘r‘,encoding=‘utf-8‘).read()
    words=jieba.lcut(txt)
    counts={}
    for word in words:
        if len(word)==1:
            continue
        else:
            counts[word]=counts.get(word,0)+1
    items=list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)

    for i in range(10):
        word,count=items[i]
        print("{0:<10}{1:>5}".format(word,count))

以上是关于python统计word文档中的词频的主要内容，如果未能解决你的问题，请参考以下文章

Python 词频统计

python词频统计

Python 中文文件统计词频 + 中文词云

软件工程之词频统计

用Python来进行词频统计