NLP-python 自然语言处理01

Posted 2020-10-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了NLP-python 自然语言处理01相关的知识，希望对你有一定的参考价值。

 1  # -*- coding: utf-8 -*-
 2 """
 3 Created on Wed Sep  6 22:21:09 2017
 4 
 5 @author: Administrator
 6 """
 7 import nltk
 8 from nltk.book import *
 9 # 搜搜单词
10 text1.concordance("monstrous")  # 查找关键词
11 
12 #搜搜相似词
13 text1.similar(‘monstrous‘)
14 
15 # 搜搜共同的上下文
16 text2.common_contexts([‘monstrous‘, ‘very‘])
17 
18 
19 # 词汇的分布
20 text4.dispersion_plot([‘moustrous‘,‘very‘])
21 
22 # 词汇的长度
23 len(text3)
24 
25 # 重复词密度
26 len(text3)/len(set(text3))
27 
28 #关键词密度
29 text3.count(‘smote‘)
30 100*text4.count(‘a‘)/len(text4)
31 
32 def lexical_diversity(text):
33     return len(text) / len(set(text))
34 
35 def percentage(count, total):
36     return 100 * count /total
37 
38 
39 
40 sent1=[‘Call‘, ‘me‘, ‘Ishmael‘, ‘.‘]
41 
42 # 获取文本词索引,注意索引的长度，从零开始
43 text3[172]
44 
45 text3.index(‘love‘)
46 
47 # 频率分布情况,对常用词语的判断
48 # 简单统计， 频率分布
49 fdist1 = FreqDist(text1)
50 
51 vocabulary1 = fdist1.keys()
52 fdist1[‘whale‘]
53 fdist1.plot(50, cumulative=True)
54 
55 # 低频词
56 fdist1.hapaxes()
57 
58 # 细粒度的词选择
59 V = set(text1)
60 long_words = [w for w in V if len(w) >15]
61 sorted(long_words)
62 
63 # 词频加词的长度同时决定
64 fdist5 = FreqDist(text5)
65 sorted([w for w in set(text5) if len(w) > 7 and fdist5[w] > 7])
66 
67 # 常用词语搭配,双元词搭配
68 from nltk.util import bigrams
69 list(bigrams([‘more‘, ‘is‘, ‘said‘, ‘than‘, ‘done‘]))
70 
71 
72 # 常用的双元词搭配
73 text4.collocations()
74 
75 # 文本中每个词的长度
76 [len(w) for w in text1]
77 
78 # 各个长度词的分布,输出是一个字典
79 fdist = FreqDist([len(w) for w in text1])
80 
81 fdist.keys()    # 索引值
82 fdist.items()   
83 fdist.max()    # 词汇出现最多的那个词的索引
84 
85 fdist[3]     # 索引值为3的位置

以上是关于NLP-python 自然语言处理01的主要内容，如果未能解决你的问题，请参考以下文章

在 Python 多处理进程中运行较慢的 OpenCV 代码片段

你如何在 python 中处理 graphql 查询和片段？

几个非常实用的JQuery代码片段

C语言代码片段

使用 Pygments 检测代码片段的编程语言

十条jQuery代码片段助力Web开发效率提升