Python3NLTK-自然语言处理

Posted 既生喻何生亮(Bright)

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python3NLTK-自然语言处理相关的知识,希望对你有一定的参考价值。

NLTK

这是一个处理文本的python库,我们知道文字性的知识可是拥有非常庞大的数据量,故而这属于大数据系列。
本文只是浅尝辄止,目前本人并未涉及这块知识,只是偶尔好奇,才写本文。

从NLTK中的book模块中,载入所有条目

  • book 模块包含所有数据
from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: \'texts()\' or \'sents()\' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1
<Text: Moby Dick by Herman Melville 1851>
text2
<Text: Sense and Sensibility by Jane Austen 1811>

搜索文本或主题

  1. concordance允许在课文中查找单词,并打印出来
  2. similar 用来识别文章中和搜索词相似的词语,可以用在搜索引擎中的相关度识别功能中。
  3. common_contexts 用来识别2个关键词相似的词语。
  4. dispersion_plot 绘制单词的离散图
text1.concordance(\'monstrous\') # 在text1中查阅词汇\'monstrous\'
# concordance 
# 英 [kən\'kɔːd(ə)ns]  美 [kən\'kɔrdns]
# n. 调和,一致;用语索引;著作或作家全集的重要用字索引
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us , 
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .\'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But 
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text2.concordance(\'affection\')
Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation wit
 can never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally st
 the most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This 
 opinion . But by an appeal to her affection for her mother , by representing t
 every alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without 
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward \' s affection , to the remembrance of every mark
 was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if 
text1.similar(\'monstrous\')
true contemptible christian abundant few part mean careful puzzled
mystifying passing curious loving wise doleful gamesome singular
delightfully perilous fearless
text2.similar(\'monstrous\')
very so exceedingly heartily a as good great extremely remarkably
sweet vast amazingly
text2.common_contexts([\'monstrous\',\'very\'])
a_pretty am_glad a_lucky is_pretty be_glad
# 从文本中检查一个单词的位置,从该单词出现开始出现了多少次。
# Each stripe represents an instance of a word, 
# and each row represents the entire text.
text4.dispersion_plot([\'citizens\',\'democracy\',\'freedon\',\'duties\',\'America\',\'liberty\'])
# dispersion 
# 英 [dɪ\'spɜːʃ(ə)n]  美 [dɪ\'spɝʒn]
# n. 散布;[统计][数] 离差;驱散

print(text3.generate(\'monstrous\'))
None

统计词汇

len(text3)
44764
sorted(set(text3))
[\'!\',
 "\'",
 \'(\',
 \')\',
 \',\',
 \',)\',
 \'.\',
 \'.)\',
 \':\',
 \';\',
 \';)\',
 \'?\',
 \'?)\',
 \'A\',
 \'Abel\',
 \'Abelmizraim\',
 \'Abidah\',
 \'Abide\',
 \'Abimael\',
 \'Abimelech\',
 \'Abr\',
 \'Abrah\',
 \'Abraham\',
 \'Abram\',
 \'Accad\',
 \'Achbor\',
 \'Adah\',
 \'Adam\',
 \'Adbeel\',
 \'Admah\',
 \'Adullamite\',
 \'After\',
 \'Aholibamah\',
 \'Ahuzzath\',
 \'Ajah\',
 \'Akan\',
 \'All\',
 \'Allonbachuth\',
 \'Almighty\',
 \'Almodad\',
 \'Also\',
 \'Alvah\',
 \'Alvan\',
 \'Am\',
 \'Amal\',
 \'Amalek\',
 \'Amalekites\',
 \'Ammon\',
 \'Amorite\',
 \'Amorites\',
 \'Amraphel\',
 \'An\',
 \'Anah\',
 \'Anamim\',
 \'And\',
 \'Aner\',
 \'Angel\',
 \'Appoint\',
 \'Aram\',
 \'Aran\',
 \'Ararat\',
 \'Arbah\',
 \'Ard\',
 \'Are\',
 \'Areli\',
 \'Arioch\',
 \'Arise\',
 \'Arkite\',
 \'Arodi\',
 \'Arphaxad\',
 \'Art\',
 \'Arvadite\',
 \'As\',
 \'Asenath\',
 \'Ashbel\',
 \'Asher\',
 \'Ashkenaz\',
 \'Ashteroth\',
 \'Ask\',
 \'Asshur\',
 \'Asshurim\',
 \'Assyr\',
 \'Assyria\',
 \'At\',
 \'Atad\',
 \'Avith\',
 \'Baalhanan\',
 \'Babel\',
 \'Bashemath\',
 \'Be\',
 \'Because\',
 \'Becher\',
 \'Bedad\',
 \'Beeri\',
 \'Beerlahairoi\',
 \'Beersheba\',
 \'Behold\',
 \'Bela\',
 \'Belah\',
 \'Benam\',
 \'Benjamin\',
 \'Beno\',
 \'Beor\',
 \'Bera\',
 \'Bered\',
 \'Beriah\',
 \'Bethel\',
 \'Bethlehem\',
 \'Bethuel\',
 \'Beware\',
 \'Bilhah\',
 \'Bilhan\',
 \'Binding\',
 \'Birsha\',
 \'Bless\',
 \'Blessed\',
 \'Both\',
 \'Bow\',
 \'Bozrah\',
 \'Bring\',
 \'But\',
 \'Buz\',
 \'By\',
 \'Cain\',
 \'Cainan\',
 \'Calah\',
 \'Calneh\',
 \'Can\',
 \'Cana\',
 \'Canaan\',
 \'Canaanite\',
 \'Canaanites\',
 \'Canaanitish\',
 \'Caphtorim\',
 \'Carmi\',
 \'Casluhim\',
 \'Cast\',
 \'Cause\',
 \'Chaldees\',
 \'Chedorlaomer\',
 \'Cheran\',
 \'Cherubims\',
 \'Chesed\',
 \'Chezib\',
 \'Come\',
 \'Cursed\',
 \'Cush\',
 \'Damascus\',
 \'Dan\',
 \'Day\',
 \'Deborah\',
 \'Dedan\',
 \'Deliver\',
 \'Diklah\',
 \'Din\',
 \'Dinah\',
 \'Dinhabah\',
 \'Discern\',
 \'Dishan\',
 \'Dishon\',
 \'Do\',
 \'Dodanim\',
 \'Dothan\',
 \'Drink\',
 \'Duke\',
 \'Dumah\',
 \'Earth\',
 \'Ebal\',
 \'Eber\',
 \'Edar\',
 \'Eden\',
 \'Edom\',
 \'Edomites\',
 \'Egy\',
 \'Egypt\',
 \'Egyptia\',
 \'Egyptian\',
 \'Egyptians\',
 \'Ehi\',
 \'Elah\',
 \'Elam\',
 \'Elbethel\',
 \'Eldaah\',
 \'EleloheIsrael\',
 \'Eliezer\',
 \'Eliphaz\',
 \'Elishah\',
 \'Ellasar\',
 \'Elon\',
 \'Elparan\',
 \'Emins\',
 \'En\',
 \'Enmishpat\',
 \'Eno\',
 \'Enoch\',
 \'Enos\',
 \'Ephah\',
 \'Epher\',
 \'Ephra\',
 \'Ephraim\',
 \'Ephrath\',
 \'Ephron\',
 \'Er\',
 \'Erech\',
 \'Eri\',
 \'Es\',
 \'Esau\',
 \'Escape\',
 \'Esek\',
 \'Eshban\',
 \'Eshcol\',
 \'Ethiopia\',
 \'Euphrat\',
 \'Euphrates\',
 \'Eve\',
 \'Even\',
 \'Every\',
 \'Except\',
 \'Ezbon\',
 \'Ezer\',
 \'Fear\',
 \'Feed\',
 \'Fifteen\',
 \'Fill\',
 \'For\',
 \'Forasmuch\',
 \'Forgive\',
 \'From\',
 \'Fulfil\',
 \'G\',
 \'Gad\',
 \'Gaham\',
 \'Galeed\',
 \'Gatam\',
 \'Gather\',
 \'Gaza\',
 \'Gentiles\',
 \'Gera\',
 \'Gerar\',
 \'Gershon\',
 \'Get\',
 \'Gether\',
 \'Gihon\',
 \'Gilead\',
 \'Girgashites\',
 \'Girgasite\',
 \'Give\',
 \'Go\',
 \'God\',
 \'Gomer\',
 \'Gomorrah\',
 \'Goshen\',
 \'Guni\',
 \'Hadad\',
 \'Hadar\',
 \'Hadoram\',
 \'Hagar\',
 \'Haggi\',
 \'Hai\',
 \'Ham\',
 \'Hamathite\',
 \'Hamor\',
 \'Hamul\',
 \'Hanoch\',
 \'Happy\',
 \'Haran\',
 \'Hast\',
 \'Haste\',
 \'Have\',
 \'Havilah\',
 \'Hazarmaveth\',
 \'Hazezontamar\',
 \'Hazo\',
 \'He\',
 \'Hear\',
 \'Heaven\',
 \'Heber\',
 \'Hebrew\',
 \'Hebrews\',
 \'Hebron\',
 \'Hemam\',
 \'Hemdan\',
 \'Here\',
 \'Hereby\',
 \'Heth\',
 \'Hezron\',
 \'Hiddekel\',
 \'Hinder\',
 \'Hirah\',
 \'His\',
 \'Hitti\',
 \'Hittite\',
 \'Hittites\',
 \'Hivite\',
 \'Hobah\',
 \'Hori\',
 \'Horite\',
 \'Horites\',
 \'How\',
 \'Hul\',
 \'Huppim\',
 \'Husham\',
 \'Hushim\',
 \'Huz\',
 \'I\',
 \'If\',
 \'In\',
 \'Irad\',
 \'Iram\',
 \'Is\',
 \'Isa\',
 \'Isaac\',
 \'Iscah\',
 \'Ishbak\',
 \'Ishmael\',
 \'Ishmeelites\',
 \'Ishuah\',
 \'Isra\',
 \'Israel\',
 \'Issachar\',
 \'Isui\',
 \'It\',
 \'Ithran\',
 \'Jaalam\',
 \'Jabal\',
 \'Jabbok\',
 \'Jac\',
 \'Jachin\',
 \'Jacob\',
 \'Jahleel\',
 \'Jahzeel\',
 \'Jamin\',
 \'Japhe\',
 \'Japheth\',
 \'Jared\',
 \'Javan\',
 \'Jebusite\',
 \'Jebusites\',
 \'Jegarsahadutha\',
 \'Jehovahjireh\',
 \'Jemuel\',
 \'Jerah\',
 \'Jetheth\',
 \'Jetur\',
 \'Jeush\',
 \'Jezer\',
 \'Jidlaph\',
 \'Jimnah\',
 \'Job\',
 \'Jobab\',
 \'Jokshan\',
 \'Joktan\',
 \'Jordan\',
 \'Joseph\',
 \'Jubal\',
 \'Judah\',
 \'Judge\',
 \'Judith\',
 \'Kadesh\',
 \'Kadmonites\',
 \'Karnaim\',
 \'Kedar\',
 \'Kedemah\',
 \'Kemuel\',
 \'Kenaz\',
 \'Kenites\',
 \'Kenizzites\',
 \'Keturah\',
 \'Kiriathaim\',
 \'Kirjatharba\',
 \'Kittim\',
 \'Know\',
 \'Kohath\',
 \'Kor\',
 \'Korah\',
 \'LO\',
 \'LORD\',
 \'Laban\',
 \'Lahairoi\',
 \'Lamech\',
 \'Lasha\',
 \'Lay\',
 \'Leah\',
 \'Lehabim\',
 \'Lest\',
 \'Let\',
 \'Letushim\',
 \'Leummim\',
 \'Levi\',
 \'Lie\',
 \'Lift\',
 \'Lo\',
 \'Look\',
 \'Lot\',
 \'Lotan\',
 \'Lud\',
 \'Ludim\',
 \'Luz\',
 \'Maachah\',
 \'Machir\',
 \'Machpelah\',
 \'Madai\',
 \'Magdiel\',
 \'Magog\',
 \'Mahalaleel\',
 \'Mahalath\',
 \'Mahanaim\',
 \'Make\',
 \'Malchiel\',
 \'Male\',
 \'Mam\',
 \'Mamre\',
 \'Man\',
 \'Manahath\',
 \'Manass\',
 \'Manasseh\',
 \'Mash\',
 \'Masrekah\',
 \'Massa\',
 \'Matred\',
 \'Me\',
 \'Medan\',
 \'Mehetabel\',
 \'Mehujael\',
 \'Melchizedek\',
 \'Merari\',
 \'Mesha\',
 \'Meshech\',
 \'Mesopotamia\',
 \'Methusa\',
 \'Methusael\',
 \'Methuselah\',
 \'Mezahab\',
 \'Mibsam\',
 \'Mibzar\',
 \'Midian\',
 \'Midianites\',
 \'Milcah\',
 \'Mishma\',
 \'Mizpah\',
 \'Mizraim\',
 \'Mizz\',
 \'Moab\',
 \'Moabites\',
 \'Moreh\',
 \'Moreover\',
 \'Moriah\',
 \'Muppim\',
 \'My\',
 \'Naamah\',
 \'Naaman\',
 \'Nahath\',
 \'Nahor\',
 \'Naphish\',
 \'Naphtali\',
 \'Naphtuhim\',
 \'Nay\',
 \'Nebajoth\',
 \'Neither\',
 \'Night\',
 \'Nimrod\',
 \'Nineveh\',
 \'Noah\',
 \'Nod\',
 \'Not\',
 \'Now\',
 \'O\',
 \'Obal\',
 \'Of\',
 \'Oh\',
 \'Ohad\',
 \'Omar\',
 \'On\',
 \'Onam\',
 \'Onan\',
 \'Only\',
 \'Ophir\',
 \'Our\',
 \'Out\',
 \'Padan\',
 \'Padanaram\',
 \'Paran\',
 \'Pass\',
 \'Pathrusim\',
 \'Pau\',
 \'Peace\',
 \'Peleg\',
 \'Peniel\',
 \'Penuel\',
 \'Peradventure\',
 \'Perizzit\',
 \'Perizzite\',
 \'Perizzites\',
 \'Phallu\',
 \'Phara\',
 \'Pharaoh\',
 \'Pharez\',
 \'Phichol\',
 \'Philistim\',
 \'Philistines\',
 \'Phut\',
 \'Phuvah\',
 \'Pildash\',
 \'Pinon\',
 \'Pison\',
 \'Potiphar\',
 \'Potipherah\',
 \'Put\',
 \'Raamah\',
 \'Rachel\',
 \'Rameses\',
 \'Rebek\',
 \'Rebekah\',
 \'Rehoboth\',
 \'Remain\',
 \'Rephaims\',
 \'Resen\',
 \'Return\',
 \'Reu\',
 \'Reub\',
 \'Reuben\',
 \'Reuel\',
 \'Reumah\',
 \'Riphath\',
 \'Rosh\',
 \'Sabtah\',
 \'Sabtech\',
 \'Said\',
 \'Salah\',
 \'Salem\',
 \'Samlah\',
 \'Sarah\',
 \'Sarai\',
 \'Saul\',
 \'Save\',
 \'Say\',
 \'Se\',
 \'Seba\',
 \'See\',
 \'Seeing\',
 \'Seir\',
 \'Sell\',
 \'Send\',
 \'Sephar\',
 \'Serah\',
 \'Sered\',
 \'Serug\',
 \'Set\',
 \'Seth\',
 \'Shalem\',
 \'Shall\',
 \'Shalt\',
 \'Shammah\',
 \'Shaul\',
 \'Shaveh\',
 \'She\',
 \'Sheba\',
 \'Shebah\',
 \'Shechem\',
 \'Shed\',
 \'Shel\',
 \'Shelah\',
 \'Sheleph\',
 \'Shem\',
 \'Shemeber\',
 \'Shepho\',
 \'Shillem\',
 \'Shiloh\',
 \'Shimron\',
 \'Shinab\',
 \'Shinar\',
 \'Shobal\',
 \'Should\',
 \'Shuah\',
 \'Shuni\',
 \'Shur\',
 \'Sichem\',
 \'Siddim\',
 \'Sidon\',
 \'Simeon\',
 \'Sinite\',
 \'Sitnah\',
 \'Slay\',
 \'So\',
 \'Sod\',
 \'Sodom\',
 \'Sojourn\',
 \'Some\',
 \'Spake\',
 \'Speak\',
 \'Spirit\',
 \'Stand\',
 \'Succoth\',
 \'Surely\',
 \'Swear\',
 \'Syrian\',
 \'Take\',
 \'Tamar\',
 \'Tarshish\',
 \'Tebah\',
 \'Tell\',
 \'Tema\',
 \'Teman\',
 \'Temani\',
 \'Terah\',
 \'Thahash\',
 \'That\',
 \'The\',
 \'Then\',
 \'There\',
 \'Therefore\',
 \'These\',
 \'They\',
 \'Thirty\',
 \'This\',
 \'Thorns\',
 \'Thou\',
 \'Thus\',
 \'Thy\',
 \'Tidal\',
 \'Timna\',
 \'Timnah\',
 \'Timnath\',
 \'Tiras\',
 \'To\',
 \'Togarmah\',
 \'Tola\',
 \'Tubal\',
 \'Tubalcain\',
 \'Twelve\',
 \'Two\',
 \'Unstable\',
 \'Until\',
 \'Unto\',
 \'Up\',
 \'Upon\',
 \'Ur\',
 \'Uz\',
 \'Uzal\',
 \'We\',
 \'What\',
 \'When\',
 \'Whence\',
 \'Where\',
 \'Whereas\',
 \'Wherefore\',
 \'Which\',
 \'While\',
 \'Who\',
 \'Whose\',
 \'Whoso\',
 \'Why\',
 \'Wilt\',
 \'With\',
 \'Woman\',
 \'Ye\',
 \'Yea\',
 \'Yet\',
 \'Zaavan\',
 \'Zaphnathpaaneah\',
 \'Zar\',
 \'Zarah\',
 \'Zeboiim\',
 \'Zeboim\',
 \'Zebul\',
 \'Zebulun\',
 \'Zemarite\',
 \'Zepho\',
 \'Zerah\',
 \'Zibeon\',
 \'Zidon\',
 \'Zillah\',
 \'Zilpah\',
 \'Zimran\',
 \'Ziphion\',
 \'Zo\',
 \'Zoar\',
 \'Zohar\',
 \'Zuzims\',
 \'a\',
 \'abated\',
 \'abide\',
 \'able\',
 \'abode\',
 \'abomination\',
 \'about\',
 \'above\',
 \'abroad\',
 \'absent\',
 \'abundantly\',
 \'accept\',
 \'accepted\',
 \'according\',
 \'acknowledged\',
 \'activity\',
 \'add\',
 \'adder\',
 \'afar\',
 \'afflict\',
 \'affliction\',
 \'afraid\',
 \'after\',
 \'afterward\',
 \'afterwards\',
 \'aga\',
 \'again\',
 \'against\',
 \'age\',
 \'aileth\',
 \'air\',
 \'al\',
 \'alive\',
 \'all\',
 \'almon\',
 \'alo\',
 \'alone\',
 \'aloud\',
 \'also\',
 \'altar\',
 \'altogether\',
 \'always\',
 \'am\',
 \'among\',
 \'amongst\',
 \'an\',
 \'and\',
 \'angel\',
 \'angels\',
 \'anger\',
 \'angry\',
 \'anguish\',
 \'anointedst\',
 \'anoth\',
 \'another\',
 \'answer\',
 \'answered\',
 \'any\',
 \'anything\',
 \'appe\',
 \'appear\',
 \'appeared\',
 \'appease\',
 \'appoint\',
 \'appointed\',
 \'aprons\',
 \'archer\',
 \'archers\',
 \'are\',
 \'arise\',
 \'ark\',
 \'armed\',
 \'arms\',
 \'army\',
 \'arose\',
 \'arrayed\',
 \'art\',
 \'artificer\',
 \'as\',
 \'ascending\',
 \'ash\',
 \'ashamed\',
 \'ask\',
 \'asked\',
 \'asketh\',
 \'ass\',
 \'assembly\',
 \'asses\',
 \'assigned\',
 \'asswaged\',
 \'at\',
 \'attained\',
 \'audience\',
 \'avenged\',
 \'aw\',
 \'awaked\',
 \'away\',
 \'awoke\',
 \'back\',
 \'backward\',
 \'bad\',
 \'bade\',
 \'badest\',
 \'badne\',
 \'bak\',
 \'bake\',
 \'bakemeats\',
 \'baker\',
 \'bakers\',
 \'balm\',
 \'bands\',
 \'bank\',
 \'bare\',
 \'barr\',
 \'barren\',
 \'basket\',
 \'baskets\',
 \'battle\',
 \'bdellium\',
 \'be\',
 \'bear\',
 \'beari\',
 \'bearing\',
 \'beast\',
 \'beasts\',
 \'beautiful\',
 \'became\',
 \'because\',
 \'become\',
 \'bed\',
 \'been\',
 \'befall\',
 \'befell\',
 \'before\',
 \'began\',
 \'begat\',
 \'beget\',
 \'begettest\',
 \'begin\',
 \'beginning\',
 \'begotten\',
 \'beguiled\',
 \'beheld\',
 \'behind\',
 \'behold\',
 \'being\',
 \'believed\',
 \'belly\',
 \'belong\',
 \'beneath\',
 \'bereaved\',
 \'beside\',
 \'besides\',
 \'besought\',
 \'best\',
 \'betimes\',
 \'better\',
 \'between\',
 \'betwixt\',
 \'beyond\',
 \'binding\',
 \'bird\',
 \'birds\',
 \'birthday\',
 \'birthright\',
 \'biteth\',
 \'bitter\',
 \'blame\',
 \'blameless\',
 \'blasted\',
 \'bless\',
 \'blessed\',
 \'blesseth\',
 \'blessi\',
 \'blessing\',
 \'blessings\',
 \'blindness\',
 \'blood\',
 \'blossoms\',
 \'bodies\',
 \'boldly\',
 \'bondman\',
 \'bondmen\',
 \'bondwoman\',
 \'bone\',
 \'bones\',
 \'book\',
 \'booths\',
 \'border\',
 \'borders\',
 \'born\',
 \'bosom\',
 \'both\',
 \'bottle\',
 \'bou\',
 \'boug\',
 \'bough\',
 \'bought\',
 \'bound\',
 \'bow\',
 \'bowed\',
 \'bowels\',
 \'bowing\',
 \'boys\',
 \'bracelets\',
 \'branches\',
 \'brass\',
 \'bre\',
 \'breach\',
 \'bread\',
 \'breadth\',
 \'break\',
 \'breaketh\',
 \'breaking\',
 \'breasts\',
 \'breath\',
 \'breathed\',
 \'breed\',
 \'brethren\',
 \'brick\',
 \'brimstone\',
 \'bring\',
 \'brink\',
 \'broken\',
 \'brook\',
 \'broth\',
 \'brother\',
 \'brought\',
 \'brown\',
 \'bruise\',
 \'budded\',
 \'build\',
 \'builded\',
 \'built\',
 \'bulls\',
 \'bundle\',
 \'bundles\',
 \'burdens\',
 \'buried\',
 \'burn\',
 \'burning\',
 \'burnt\',
 \'bury\',
 \'buryingplace\',
 \'business\',
 \'but\',
 \'butler\',
 \'butlers\',
 \'butlership\',
 \'butter\',
 \'buy\',
 \'by\',
 \'cakes\',
 \'calf\',
 \'call\',
 \'called\',
 \'came\',
 \'camel\',
 \'camels\',
 \'camest\',
 \'can\',
 \'cannot\',
 \'canst\',
 \'captain\',
 \'captive\',
 \'captives\',
 \'carcases\',
 \'carried\',
 \'carry\',
 \'cast\',
 \'castles\',
 \'catt\',
 \'cattle\',
 \'caught\',
 \'cause\',
 \'caused\',
 \'cave\',
 \'cease\',
 \'ceased\',
 \'certain\',
 \'certainly\',
 \'chain\',
 \'chamber\',
 \'change\',
 \'changed\',
 \'changes\',
 \'charge\',
 \'charged\',
 \'chariot\',
 \'chariots\',
 \'chesnut\',
 \'chi\',
 \'chief\',
 \'child\',
 \'childless\',
 \'childr\',
 \'children\',
 \'chode\',
 \'choice\',
 \'chose\',
 \'circumcis\',
 \'circumcise\',
 \'circumcised\',
 \'citi\',
 \'cities\',
 \'city\',
 \'clave\',
 \'clean\',
 \'clear\',
 \'cleave\',
 \'clo\',
 \'closed\',
 \'clothed\',
 \'clothes\',
 \'cloud\',
 \'clusters\',
 \'co\',
 \'coat\',
 \'coats\',
 \'coffin\',
 \'cold\',
 ...]
len(set(text3))
2789
len(text3)/len(set(text3))
16.050197203298673
text3.count(\'smote\')
5
100*text4.count(\'a\')/len(text4)
1.4643016433938312
def lexical_diversity(text):
    # lexical英[\'leksɪk(ə)l] 美 [\'lɛksɪkl]
    # adj.词汇的;[语] 词典的;词典编纂的
    # diversity英[daɪ\'vɜːsɪtɪ; dɪ-]美 [dɪˈvəsɪti]
    # n.多样性;差异
    return len(text)/len(set(text))
def percentage(count, total):
    return 100*count/total

print(\'text3中词汇多样性指标:{}\'.format(lexical_diversity(text3)))
print(\'text4中单词a占全文的百分比:{}\'.format(percentage(text4.count(\'a\'),len(text4))))
text3中词汇多样性指标:16.050197203298673
text4中单词a占全文的百分比:1.4643016433938312

列表 = Lists

sent1 = [\'Call\', \'me\',\'Ishmael\',\'.\']
print(\'打印sent1中的内容:{}\'.format(sent1))
print(\'打印sent1中内容的长度:{}\'.format(len(sent1)))
print(\'sent1中词汇多样性指标:{}\'.format(lexical_diversity(sent1)))
打印sent1中的内容:[\'Call\', \'me\', \'Ishmael\', \'.\']
打印sent1中内容的长度:4
sent1中词汇多样性指标:1.0
sent1,sent2,sent3,sent4 # 这是内部定义好的列表
([\'Call\', \'me\', \'Ishmael\', \'.\'],
 [\'The\',
  \'family\',
  \'of\',
  \'Dashwood\',
  \'had\',
  \'long\',
  \'been\',
  \'settled\',
  \'in\',
  \'Sussex\',
  \'.\'],
 [\'In\',
  \'the\',
  \'beginning\',
  \'God\',
  \'created\',
  \'the\',
  \'heaven\',
  \'and\',
  \'the\',
  \'earth\',
  \'.\'],
 [\'Fellow\',
  \'-\',
  \'Citizens\',
  \'of\',
  \'the\',
  \'Senate\',
  \'and\',
  \'of\',
  \'the\',
  \'House\',
  \'of\',
  \'Representatives\',
  \':\'])
sent4+sent1
[\'Fellow\',
 \'-\',
 \'Citizens\',
 \'of\',
 \'the\',
 \'Senate\',
 \'and\',
 \'of\',
 \'the\',
 \'House\',
 \'of\',
 \'Representatives\',
 \':\',
 \'Call\',
 \'me\',
 \'Ishmael\',
 \'.\']
sent1.append(\'Some\')
[\'Call\', \'me\', \'Ishmael\', \'.\', \'Some\', \'Some\', \'Some\', \'Some\']

列表索引

type(text4)
nltk.text.Text
text4[173]
\'awaken\'
text4.index(\'awaken\')
173
text5[16715:16735]
[\'U86\',
 \'thats\',
 \'why\',
 \'something\',
 \'like\',
 \'gamefly\',
 \'is\',
 \'so\',
 \'good\',
 \'because\',
 \'you\',
 \'can\',
 \'actually\',
 \'play\',
 \'a\',
 \'full\',
 \'game\',
 \'without\',
 \'buying\',
 \'it\']
text6[1600:1625]
[\'We\',
 "\'",
 \'re\',
 \'an\',
 \'anarcho\',
 \'-\',
 \'syndicalist\',
 \'commune\',
 \'.\',
 \'We\',
 \'take\',
 \'it\',
 \'in\',
 \'turns\',
 \'to\',
 \'act\',
 \'as\',
 \'a\',
 \'sort\',
 \'of\',
 \'executive\',
 \'officer\',
 \'for\',
 \'the\',
 \'week\']

变量

sent1 = [\'Call\',\'me\',\'Ishmael\',\'.\']
my_sent = [\'Bravely\',\'bold\',\'Sir\',\'Robin\',\',\',\'rode\',\'forth\',\'from\',\'Camelot\',\'.\']
noun_phrase = my_sent[1:4]
print(\'打印切片后的列表:noun_phrase-》{}\'.format(noun_phrase))
wOrDs = sorted(noun_phrase)
print(\'打印排序后的列表:wOrDs-》{}\'.format(wOrDs))
打印切片后的列表:noun_phrase-》[\'bold\', \'Sir\', \'Robin\']
打印排序后的列表:wOrDs-》[\'Robin\', \'Sir\', \'bold\']

字符串

name = \'bright\'
print(\'打印name中的第一个字母:{}\'.format(name[0]))
print(name[:4])
print(name*2)
print(name + \'!\')
打印name中的第一个字母:b
brig
brightbright
bright!
\' \'.join([\'Monty\', \'Python\'])
\'Monty Python\'
\'Monty Python\'.split()
[\'Monty\', \'Python\']
saying = [\'After\',\'all\',\'is\',\'said\',\'and\',\'done\',\'more\',\'is\',\'said\',\'than\',\'done\']
tokens = set(saying)
tokens = sorted(tokens)
tokens[-2:]
[\'said\', \'than\']
fdist1 = FreqDist(text1)
vocabulary1 = fdist1.keys()
type(vocabulary1)
dict_keys
fdist1.plot(50, cumulative=True)
#Cumulative frequency plot for the 50 most frequently used words in Moby Dick, which
#account for nearly half of the tokens.

fdist1.hapaxes() #the words that occur once only
[\'Herman\',
 \'Melville\',
 \']\',
 \'ETYMOLOGY\',
 \'Late\',
 \'Consumptive\',
 \'School\',
 \'threadbare\',
 \'lexicons\',
 \'mockingly\',
 \'flags\',
 \'mortality\',
 \'signification\',
 \'HACKLUYT\',
 \'Sw\',
 \'HVAL\',
 \'roundness\',
 \'Dut\',
 \'Ger\',
 \'WALLEN\',
 \'WALW\',
 \'IAN\',
 \'RICHARDSON\',
 \'KETOS\',
 \'GREEK\',
 \'CETUS\',
 \'LATIN\',
 \'WHOEL\',
 \'ANGLO\',
 \'SAXON\',
 \'WAL\',
 \'HWAL\',
 \'SWEDISH\',
 \'ICELANDIC\',
 \'BALEINE\',
 \'BALLENA\',
 \'FEGEE\',
 \'ERROMANGOAN\',
 \'Librarian\',
 \'painstaking\',
 \'burrower\',
 \'grub\',
 \'Vaticans\',
 \'stalls\',
 \'higgledy\',
 \'piggledy\',
 \'gospel\',
 \'promiscuously\',
 \'commentator\',
 \'belongest\',
 \'sallow\',
 \'Pale\',
 \'Sherry\',
 \'loves\',
 \'bluntly\',
 \'Subs\',
 \'thankless\',
 \'Hampton\',
 \'Court\',
 \'hie\',
 \'refugees\',
 \'pampered\',
 \'Michael\',
 \'Raphael\',
 \'unsplinterable\',
 \'GENESIS\',
 \'JOB\',
 \'JONAH\',
 \'punish\',
 \'ISAIAH\',
 \'soever\',
 \'cometh\',
 \'incontinently\',
 \'perisheth\',
 \'PLUTARCH\',
 \'MORALS\',
 \'breedeth\',
 \'Whirlpooles\',
 \'Balaene\',
 \'arpens\',
 \'PLINY\',
 \'Scarcely\',
 \'TOOKE\',
 \'LUCIAN\',
 \'TRUE\',
 \'catched\',
 \'OCTHER\',
 \'VERBAL\',
 \'TAKEN\',
 \'MOUTH\',
 \'ALFRED\',
 \'890\',
 \'gudgeon\',
 \'retires\',
 \'MONTAIGNE\',
 \'APOLOGY\',
 \'RAIMOND\',
 \'SEBOND\',
 \'Nick\',
 \'RABELAIS\',
 \'cartloads\',
 \'STOWE\',
 \'ANNALS\',
 \'LORD\',
 \'BACON\',
 \'Touching\',
 \'ork\',
 \'DEATH\',
 \'sovereignest\',
 \'bruise\',
 \'HAMLET\',
 \'leach\',
 \'Mote\',
 \'availle\',
 \'returne\',
 \'againe\',
 \'worker\',
 \'Dinting\',
 \'paine\',
 \'thro\',
 \'maine\',
 \'FAERIE\',
 \'Immense\',
 \'til\',
 \'DAVENANT\',
 \'PREFACE\',
 \'GONDIBERT\',
 \'spermacetti\',
 \'Hosmannus\',
 \'Nescio\',
 \'VIDE\',
 \'Spencer\',
 \'Talus\',
 \'flail\',
 \'threatens\',
 \'jav\',
 \'lins\',
 \'WALLER\',
 \'SUMMER\',
 \'ISLANDS\',
 \'Commonwealth\',
 \'Civitas\',
 \'OPENING\',
 \'SENTENCE\',
 \'HOBBES\',
 \'LEVIATHAN\',
 \'Silly\',
 \'Mansoul\',
 \'chewing\',
 \'sprat\',
 \'PILGRIM\',
 \'PROGRESS\',
 \'Created\',
 \'PARADISE\',
 \'LOST\',
 \'---"\',
 \'Hugest\',
 \'Stretched\',
 \'Draws\',
 \'FULLLER\',
 \'PROFANE\',
 \'HOLY\',
 \'STATE\',
 \'DRYDEN\',
 \'ANNUS\',
 \'MIRABILIS\',
 \'aground\',
 \'EDGE\',
 \'TEN\',
 \'SPITZBERGEN\',
 \'PURCHAS\',
 \'wantonness\',
 \'fuzzing\',
 \'vents\',
 \'HERBERT\',
 \'INTO\',
 \'ASIA\',
 \'AFRICA\',
 \'SCHOUTEN\',
 \'SIXTH\',
 \'CIRCUMNAVIGATION\',
 \'Elbe\',
 \'ducat\',
 \'herrings\',
 \'GREENLAND\',
 \'Several\',
 \'Fife\',
 \'Anno\',
 \'1652\',
 \'Pitferren\',
 \'SIBBALD\',
 \'FIFE\',
 \'KINROSS\',
 \'Myself\',
 \'Sperma\',
 \'ceti\',
 \'fierceness\',
 \'RICHARD\',
 \'STRAFFORD\',
 \'LETTER\',
 \'BERMUDAS\',
 \'PHIL\',
 \'TRANS\',
 \'1668\',
 \'PRIMER\',
 \'COWLEY\',
 \'1729\',
 \'"...\',
 \'frequendy\',
 \'insupportable\',
 \'disorder\',
 \'ULLOA\',
 \'SOUTH\',
 \'AMERICA\',
 \'sylphs\',
 \'petticoat\',
 \'Oft\',
 \'Tho\',
 \'RAPE\',
 \'LOCK\',
 \'NAT\',
 \'wales\',
 \'JOHNSON\',
 \'COOK\',
 \'dung\',
 \'lime\',
 \'juniper\',
 \'UNO\',
 \'VON\',
 \'TROIL\',
 \'LETTERS\',
 \'BANKS\',
 \'SOLANDER\',
 \'1772\',
 \'Nantuckois\',
 \'JEFFERSON\',
 \'MEMORIAL\',
 \'MINISTER\',
 \'REFERENCE\',
 \'PARLIAMENT\',
 \'SOMEWHERE\',
 \'guarding\',
 \'protecting\',
 \'robbers\',
 \'BLACKSTONE\',
 \'Rodmond\',
 \'suspends\',
 \'attends\',
 \'FALCONER\',
 \'Bright\',
 \'roofs\',
 \'domes\',
 \'rockets\',
 \'Around\',
 \'unwieldy\',
 \'COWPER\',
 \'VISIT\',
 \'LONDON\',
 \'HUNTER\',
 \'DISSECTION\',
 \'SMALL\',
 \'SIZED\',
 \'aorta\',
 \'gushing\',
 \'PALEY\',
 \'THEOLOGY\',
 \'mammiferous\',
 \'hind\',
 \'BARON\',
 \'CUVIER\',
 \'COLNETT\',
 \'PURPOSE\',
 \'EXTENDING\',
 \'SPERMACETI\',
 \'Floundered\',
 \'chace\',
 \'peopling\',
 \'Gather\',
 \'Led\',
 \'instincts\',
 \'trackless\',
 \'Assaulted\',
 \'voracious\',
 \'spiral\',
 \'MONTGOMERY\',
 \'WORLD\',
 \'FLOOD\',
 \'Paean\',
 \'fatter\',
 \'Flounders\',
 \'CHARLES\',
 \'LAMB\',
 \'TRIUMPH\',
 \'1690\',
 \'OBED\',
 \'Susan\',
 \'HAWTHORNE\',
 \'TWICE\',
 \'bespeak\',
 \'raal\',
 \'COOPER\',
 \'PILOT\',
 \'Berlin\',
 \'Gazette\',
 \'ECKERMANN\',
 \'CONVERSATIONS\',
 \'GOETHE\',
 \'ESSEX\',
 \'WAS\',
 \'ATTACKED\',
 \'FINALLY\',
 \'DESTROYED\',
 \'OWEN\',
 \'CHACE\',
 \'FIRST\',
 \'SAID\',
 \'VESSEL\',
 \'YORK\',
 \'1821\',
 \'piping\',
 \'dimmed\',
 \'phospher\',
 \'ELIZABETH\',
 \'OAKES\',
 \'SMITH\',
 \'amounted\',
 \'440\',
 \'SCORESBY\',
 \'Mad\',
 \'agonies\',
 \'endures\',
 \'infuriated\',
 \'rears\',
 \'snaps\',
 \'propelled\',
 \'observers\',
 \'opportunities\',
 \'habitudes\',
 \'BEALE\',
 \'offensively\',
 \'artful\',
 \'mischievous\',
 \'FREDERICK\',
 \'DEBELL\',
 \'1840\',
 \'October\',
 \'Raise\',
 \'ay\',
 \'THAR\',
 \'bowes\',
 \'os\',
 \'ROSS\',
 \'ETCHINGS\',
 \'CRUIZE\',
 \'1846\',
 \'Globe\',
 \'transactions\',
 \'relate\',
 \'HUSSEY\',
 \'SURVIVORS\',
 \'parried\',
 \'MISSIONARY\',
 \'JOURNAL\',
 \'TYERMAN\',
 \'boldest\',
 \'persevering\',
 \'REPORT\',
 \'DANIEL\',
 \'SPEECH\',
 \'SENATE\',
 \'APPLICATION\',
 \'ERECTION\',
 \'BREAKWATER\',
 \'CAPTORS\',
 \'WHALEMAN\',
 \'ADVENTURES\',
 \'BIOGRAPHY\',
 \'GATHERED\',
 \'HOMEWARD\',
 \'COMMODORE\',
 \'PREBLE\',
 \'REV\',
 \'CHEEVER\',
 \'MUTINEER\',
 \'BROTHER\',
 \'ANOTHER\',
 \'MCCULLOCH\',
 \'COMMERCIAL\',
 \'reciprocal\',
 \'clews\',
 \'SOMETHING\',
 \'UNPUBLISHED\',
 \'CURRENTS\',
 \'Pedestrians\',
 \'recollect\',
 \'gateways\',
 \'VOYAGER\',
 \'ARCTIC\',
 \'NEWSPAPER\',
 \'TAKING\',
 \'RETAKING\',
 \'HOBOMACK\',
 \'MIRIAM\',
 \'FISHERMAN\',
 \'appliance\',
 \'RIBS\',
 \'TRUCKS\',
 \'Terra\',
 \'Del\',
 \'Fuego\',
 \'DARWIN\',
 \'NATURALIST\',
 ";--\'",
 \'!\\\'"\',
 \'WHARTON\',
 \'Loomings\',
 \'spleen\',
 \'regulating\',
 \'circulation\',
 \'Whenever\',
 \'drizzly\',
 \'hypos\',
 \'philosophical\',
 \'Cato\',
 \'Manhattoes\',
 \'reefs\',
 \'downtown\',
 \'gazers\',
 \'Circumambulate\',
 \'Corlears\',
 \'Coenties\',
 \'Slip\',
 \'Whitehall\',
 \'Posted\',
 \'sentinels\',
 \'spiles\',
 \'pier\',
 \'lath\',
 \'counters\',
 \'desks\',
 \'loitering\',
 \'shady\',
 \'Inlanders\',
 \'lanes\',
 \'alleys\',
 \'attract\',
 \'dale\',
 \'dreamiest\',
 \'shadiest\',
 \'quietest\',
 \'enchanting\',
 \'Saco\',
 \'crucifix\',
 \'Deep\',
 \'mazy\',
 \'Tiger\',
 \'Tennessee\',
 \'Rockaway\',
 \'Persians\',
 \'deity\',
 \'Narcissus\',
 \'ungraspable\',
 \'hazy\',
 \'quarrelsome\',
 \'offices\',
 \'abominate\',
 \'toils\',
 \'trials\',
 \'barques\',
 \'schooners\',
 \'broiling\',
 \'buttered\',
 \'judgmatically\',
 \'peppered\',
 \'reverentially\',
 \'idolatrous\',
 \'dotings\',
 \'ibis\',
 \'roasted\',
 \'bake\',
 \'plumb\',
 \'Van\',
 \'Rensselaers\',
 \'Randolphs\',
 \'Hardicanutes\',
 \'lording\',
 \'tallest\',
 \'decoction\',
 \'Seneca\',
 \'Stoics\',
 \'Testament\',
 \'promptly\',
 \'rub\',
 \'infliction\',
 \'BEING\',
 \'PAID\',
 \'urbane\',
 \'ills\',
 \'monied\',
 \'consign\',
 \'prevalent\',
 \'violate\',
 \'Pythagorean\',
 \'commonalty\',
 \'police\',
 \'surveillance\',
 \'programme\',
 \'solo\',
 \'CONTESTED\',
 \'ELECTION\',
 \'PRESIDENCY\',
 \'UNITED\',
 \'STATES\',
 \'ISHMAEL\',
 \'BLOODY\',
 \'AFFGHANISTAN\',
 \'managers\',
 \'genteel\',
 \'comedies\',
 \'farces\',
 \'cunningly\',
 \'disguises\',
 \'cajoling\',
 \'unbiased\',
 \'freewill\',
 \'discriminating\',
 \'overwhelming\',
 \'undeliverable\',
 \'itch\',
 \'forbidden\',
 \'ignoring\',
 \'lodges\',
 \'Carpet\',
 \'Bag\',
 \'Manhatto\',
 \'candidates\',
 \'penalties\',
 \'Tyre\',
 \'Carthage\',
 \'imported\',
 \'cobblestones\',
 \'bitingly\',
 \'shouldering\',
 \'price\',
 \'fervent\',
 \'asphaltic\',
 \'pavement\',
 \'flinty\',
 \'projections\',
 \'soles\',
 \'Too\',
 \'cheapest\',
 \'cheeriest\',
 \'invitingly\',
 \'particles\',
 \'peer\',
 \'Angel\',
 \'Doom\',
 \'wailing\',
 \'gnashing\',
 \'Wretched\',
 \'entertainment\',
 \'Moving\',
 \'emigrant\',
 \'poverty\',
 \'creak\',
 \'lodgings\',
 \'zephyr\',
 \'hob\',
 \'toasting\',
 \'observest\',
 \'sashless\',
 \'glazier\',
 \'reasonest\',
 \'chinks\',
 \'crannies\',
 \'lint\',
 \'chattering\',
 \'shiverings\',
 \'cob\',
 \'redder\',
 \'Orion\',
 \'glitters\',
 \'conservatories\',
 \'president\',
 \'temperance\',
 \'blubbering\',
 \'straggling\',
 \'wainscots\',
 \'reminding\',
 \'oilpainting\',
 \'besmoked\',
 \'defaced\',
 \'unequal\',
 \'crosslights\',
 \'hags\',
 \'delineate\',
 \'bewitched\',
 \'ponderings\',
 \'boggy\',
 \'soggy\',
 \'squitchy\',
 \'froze\',
 \'heath\',
 \'icebound\',
 \'represents\',
 \'Horner\',
 \'foundered\',
 \'clubs\',
 \'harvesting\',
 \'hacking\',
 \'horrifying\',
 \'Mixed\',
 \'Nathan\',
 \'Swain\',
 \'corkscrew\',
 \'Blanco\',
 \'sojourning\',
 \'fireplaces\',
 \'duskier\',
 \'cockpits\',
 \'rarities\',
 \'Projecting\',
 \'Within\',
 \'shelves\',
 \'flasks\',
 \'bustles\',
 \'deliriums\',
 \'Abominable\',
 \'tumblers\',
 \'cylinders\',
 \'goggling\',
 \'deceitfully\',
 \'tapered\',
 \'Parallel\',
 \'pecked\',
 \'footpads\',
 \'Fill\',
 \'shilling\',
 \'examining\',
 \'SKRIMSHANDER\',
 \'accommodated\',
 \'unoccupied\',
 \'haint\',
 \'pose\',
 \'whalin\',
 \'decidedly\',
 \'objectionable\',
 \'wander\',
 \'Battery\',
 \'ruminating\',
 \'adorning\',
 \'potatoes\',
 \'sartainty\',
 \'diabolically\',
 \'steaks\',
 \'undress\',
 \'looker\',
 \'rioting\',
 \'Grampus\',
 \'seed\',
 \'Feegees\',
 \'tramping\',
 \'Enveloped\',
 \'bedarned\',
 \'eruption\',
 \'officiating\',
 \'brimmers\',
 \'complained\',
 \'potion\',
 \'colds\',
 \'catarrhs\',
 \'liquor\',
 \'arrantest\',
 \'topers\',
 \'obstreperously\',
 \'aloof\',
 \'desirous\',
 \'hilarity\',
 \'coffer\',
 \'Southerner\',
 \'mountaineers\',
 \'Alleghanian\',
 \'missed\',
 \'supernaturally\',
 \'congratulate\',
 \'multiply\',
 \'bachelor\',
 \'abominated\',
 \'tidiest\',
 \'bedwards\',
 \'shan\',
 \'tablecloth\',
 \'Skrimshander\',
 \'bump\',
 \'spraining\',
 \'eider\',
 \'yoking\',
 \'rickety\',
 \'whirlwinds\',
 \'knockings\',
 \'dismissed\',
 \'popped\',
 \'cherishing\',
 \'chuckled\',
 \'chuckle\',
 \'mightily\',
 \'catches\',
 \'bamboozingly\',
 \'overstocked\',
 \'toothpick\',
 \'rayther\',
 \'BROWN\',
 \'slanderin\',
 \'farrago\',
 \'BROKE\',
 \'Sartain\',
 \'Mt\',
 \'Hecla\',
 \'persist\',
 \'mystifying\',
 \'unsay\',
 \'criminal\',
 \'Wall\',
 \'purty\',
 \'sarmon\',
 \'rips\',
 \'tellin\',
 \'bought\',
 \'balmed\',
 \'curios\',
 \'sellin\',
 \'inions\',
 \'fooling\',
 \'idolators\',
 \'Depend\',
 \'reg\',
 \'lar\',
 \'spliced\',
 \'Johnny\',
 \'sprawling\',
 \'Arter\',
 \'glim\',
 \'jiffy\',
 \'irresolute\',
 \'vum\',
 \'WON\',
 \'Folding\',
 \'scrutiny\',
 \'porcupine\',
 \'moccasin\',
 \'ponchos\',
 \'parade\',
 \'rainy\',
 \'remembering\',
 \'commended\',
 \'cobs\',
 \'Nod\',
 \'footfall\',
 \'unlacing\',
 \'blackish\',
 \'plasters\',
 \'inkling\',
 \'Placing\',
 \'crammed\',
 \'scalp\',
 \'mildewed\',
 \'Ignorance\',
 \'parent\',
 \'nonplussed\',
 \'undressing\',
 \'checkered\',
 \'Thirty\',
 \'frogs\',
 \'quaked\',
 \'wrapall\',
 \'dreadnaught\',
 \'fumbled\',
 \'Remembering\',
 \'manikin\',
 \'tenpin\',
 \'andirons\',
 \'jambs\',
 \'bricks\',
 \'appropriate\',
 \'applying\',
 \'hastier\',
 \'withdrawals\',
 \'antics\',
 \'devotee\',
 \'extinguishing\',
 \'unceremoniously\',
 \'bagged\',
 \'sportsman\',
 \'woodcock\',
 \'uncomfortableness\',
 \'deliberating\',
 \'puffed\',
 \'sang\',
 \'Stammering\',
 \'conjured\',
 \'responses\',
 \'debel\',
 \'flourishing\',
 \'Angels\',
 \'flourishings\',
 \'peddlin\',
 \'sleepe\',
 \'grunted\',
 \'gettee\',
 \'motioning\',
 \'comely\',
 \'insured\',
 \'Counterpane\',
 \'parti\',
 \'triangles\',
 \'interminable\',
 \'caper\',
 \'supperless\',
 \'21st\',
 \'hemisphere\',
 \'sigh\',
 \'Sixteen\',
 \'ached\',
 \'coaches\',
 \'stockinged\',
 \'slippering\',
 \'misbehaviour\',
 \'unendurable\',
 \'stepmothers\',
 \'misfortunes\',
 \'steeped\',
 \'shudderingly\',
 \'confounding\',
 \'soberly\',
 \'recurred\',
 \'predicament\',
 \'unlock\',
 \'bridegroom\',
 \'clasp\',
 \'hugged\',
 \'rouse\',
 \'snore\',
 \'scratch\',
 \'Throwing\',
 \'expostulations\',
 \'unbecomingness\',
 \'matrimonial\',
 \'dawning\',
 \'overture\',
 \'innate\',
 \'compliment\',
 \'civility\',
 \'rudeness\',
 \'toilette\',
 \'dressing\',
 \'donning\',
 \'gaspings\',
 \'booting\',
 \'caterpillar\',
 \'outlandishness\',
 \'manners\',
 \'education\',
 \'undergraduate\',
 \'dreamt\',
 \'cowhide\',
 \'pinched\',
 \'curtains\',
 \'indecorous\',
 \'contented\',
 \'restricting\',
 \'donned\',
 \'lathering\',
 \'unsheathes\',
 \'whets\',
 \'Rogers\',
 \'cutlery\',
 \'Afterwards\',
 \'baton\',
 \'Breakfast\',
 \'pleasantly\',
 \'bountifully\',
 \'laughable\',
 \'bosky\',
 \'unshorn\',
 \'gowns\',
 \'toasted\',
 \'lingers\',
 \'tarried\',
 \'barred\',
 \'Grub\',
 \'Park\',
 \'assurance\',
 \'polish\',
 \'occasioned\',
 \'embarrassed\',
 \'bashfulness\',
 \'duelled\',
 \'winking\',
 \'tastes\',
 \'sheepishly\',
 \'bashful\',
 \'icicle\',
 \'admirer\',
 \'cordially\',
 \'grappling\',
 \'genteelly\',
 \'eschewed\',
 \'undivided\',
 \'6\',
 \'circulating\',
 \'nondescripts\',
 \'Chestnut\',
 \'jostle\',
 \'Regent\',
 \'Lascars\',
 \'Bombay\',
 \'Apollo\',
 \'Feegeeans\',
 \'Tongatobooarrs\',
 \'Erromanggoans\',
 \'Pannangians\',
 \'Brighggians\',
 \'weekly\',
 \'Vermonters\',
 \'stalwart\',
 \'frames\',
 \'felled\',
 \'strutting\',
 \'wester\',
 \'bombazine\',
 \'cloak\',
 \'mow\',
 \'gloves\',
 \'joins\',
 \'outfit\',
 \'waistcoats\',
 \'Hay\',
 \'Seed\',
 \'tract\',
 \'dearest\',
 \'pave\',
 \'eggs\',
 \'patrician\',
 \'parks\',
 \'scraggy\',
 \'scoria\',
 \'Herr\',
 \'dowers\',
 \'nieces\',
 \'reservoirs\',
 \'maples\',
 \'bountiful\',
 \'proffer\',
 \'passer\',
 \'cones\',
 \'blossoms\',
 \'superinduced\',
 \'carnation\',
 \'Salem\',
 \'sweethearts\',
 \'Puritanic\',
 \'Whaleman\',
 \'Wrapping\',
 \'Each\',
 \'quote\',
 \'TALBOT\',
 \'Near\',
 \'Desolation\',
 \'1st\',
 \'SISTER\',
 \'ROBERT\',
 \'WILLIS\',
 \'ELLERY\',
 \'NATHAN\',
 \'COLEMAN\',
 \'WALTER\',
 \'CANNY\',
 \'SETH\',
 \'GLEIG\',
 \'Forming\',
 \'ELIZA\',
 \'31st\',
 \'MARBLE\',
 \'SHIPMATES\',
 \'EZEKIEL\',
 \'HARDY\',
 \'AUGUST\',
 \'3d\',
 \'1833\',
 \'WIDOW\',
 \'Shaking\',
 \'glazed\',
 \'Affected\',
 \'relatives\',
 \'unhealing\',
 \'sympathetically\',
 \'wounds\',
 \'bleed\',
 \'blanks\',
 ...]

单词的精细选择

  1. the set of all w such that w is an element of V (the vocabulary) and w has property P
    {w|w \\(\\in\\) V and P(w)}
  2. The corresponding Python expression is given:
    [w for w in V if p(w)]
V = set(text1)
long_words = [w for w in V if len(w)>15]
sorted(long_words)
[\'CIRCUMNAVIGATION\',
 \'Physiognomically\',
 \'apprehensiveness\',
 \'cannibalistically\',
 \'characteristically\',
 \'circumnavigating\',
 \'circumnavigation\',
 \'circumnavigations\',
 \'comprehensiveness\',
 \'hermaphroditical\',
 \'indiscriminately\',
 \'indispensableness\',
 \'irresistibleness\',
 \'physiognomically\',
 \'preternaturalness\',
 \'responsibilities\',
 \'simultaneousness\',
 \'subterraneousness\',
 \'supernaturalness\',
 \'superstitiousness\',
 \'uncomfortableness\',
 \'uncompromisedness\',
 \'undiscriminating\',
 \'uninterpenetratingly\']

本文选自《Natural Language Processing with Python》

以上是关于Python3NLTK-自然语言处理的主要内容,如果未能解决你的问题,请参考以下文章

几个非常实用的JQuery代码片段

C语言代码片段

使用 Pygments 检测代码片段的编程语言

十条jQuery代码片段助力Web开发效率提升

十条jQuery代码片段助力Web开发效率提升

处理屏幕旋转上的片段重复(带有示例代码)