python 自然语言处理____词典资源
1. 词汇列表语料库

1 >>> def unusual_words(text): 2 ... text_vocab=set(w.lower() for w in text if w.isalpha()) 3 ... english_vocab=set(w.lower() for w in nltk.corpus.words.words()) 4 ... unusual=text_vocab.difference(english_vocab) 5 ... return sorted(unusual) 6 ... 7 >>> dif1=unusual_words(nltk.corpus.gutenberg.words(‘austen-sense.txt‘)) 8 >>> dif1[:20] 9 [‘abbeyland‘, ‘abhorred‘, ‘abilities‘, ‘abounded‘, ‘abridgement‘, ‘abused‘, ‘abu 10 ses‘, ‘accents‘, ‘accepting‘, ‘accommodations‘, ‘accompanied‘, ‘accounted‘, ‘acc 11 ounts‘, ‘accustomary‘, ‘aches‘, ‘acknowledging‘, ‘acknowledgment‘, ‘acknowledgme 12 nts‘, ‘acquaintances‘, ‘acquiesced‘] 13 >>> dif2=unusual_words(nltk.corpus.nps_chat.words()) 14 >>> dif2[:20] 15 [‘aaaaaaaaaaaaaaaaa‘, ‘aaahhhh‘, ‘abortions‘, ‘abou‘, ‘abourted‘, ‘abs‘, ‘ack‘, 16 ‘acros‘, ‘actualy‘, ‘adams‘, ‘adds‘, ‘adduser‘, ‘adjusts‘, ‘adoted‘, ‘adreniline 17 ‘, ‘ads‘, ‘adults‘, ‘afe‘, ‘affairs‘, ‘affari‘] 18 >>>
2. 停用词语料库
该语料库包括的是高频词汇,如:the, to 和 and, 有时在进一步进行处理之前需要将他们从文档中过滤。停用词通常没有什么词汇内容,而它们的出现会使区分文本变得困难。

1 >>> from nltk.corpus import stopwords 2 >>> stopwords.words(‘english‘) 3 [‘i‘, ‘me‘, ‘my‘, ‘myself‘, ‘we‘, ‘our‘, ‘ours‘, ‘ourselves‘, ‘you‘, ‘your‘, ‘yo 4 urs‘, ‘yourself‘, ‘yourselves‘, ‘he‘, ‘him‘, ‘his‘, ‘himself‘, ‘she‘, ‘her‘, ‘he 5 rs‘, ‘herself‘, ‘it‘, ‘its‘, ‘itself‘, ‘they‘, ‘them‘, ‘their‘, ‘theirs‘, ‘thems 6 elves‘, ‘what‘, ‘which‘, ‘who‘, ‘whom‘, ‘this‘, ‘that‘, ‘these‘, ‘those‘, ‘am‘, 7 ‘is‘, ‘are‘, ‘was‘, ‘were‘, ‘be‘, ‘been‘, ‘being‘, ‘have‘, ‘has‘, ‘had‘, ‘having 8 ‘, ‘do‘, ‘does‘, ‘did‘, ‘doing‘, ‘a‘, ‘an‘, ‘the‘, ‘and‘, ‘but‘, ‘if‘, ‘or‘, ‘be 9 cause‘, ‘as‘, ‘until‘, ‘while‘, ‘of‘, ‘at‘, ‘by‘, ‘for‘, ‘with‘, ‘about‘, ‘again 10 st‘, ‘between‘, ‘into‘, ‘through‘, ‘during‘, ‘before‘, ‘after‘, ‘above‘, ‘below‘ 11 , ‘to‘, ‘from‘, ‘up‘, ‘down‘, ‘in‘, ‘out‘, ‘on‘, ‘off‘, ‘over‘, ‘under‘, ‘again‘ 12 , ‘further‘, ‘then‘, ‘once‘, ‘here‘, ‘there‘, ‘when‘, ‘where‘, ‘why‘, ‘how‘, ‘al 13 l‘, ‘any‘, ‘both‘, ‘each‘, ‘few‘, ‘more‘, ‘most‘, ‘other‘, ‘some‘, ‘such‘, ‘no‘, 14 ‘nor‘, ‘not‘, ‘only‘, ‘own‘, ‘same‘, ‘so‘, ‘than‘, ‘too‘, ‘very‘, ‘s‘, ‘t‘, ‘ca 15 n‘, ‘will‘, ‘just‘, ‘don‘, ‘should‘, ‘now‘, ‘d‘, ‘ll‘, ‘m‘, ‘o‘, ‘re‘, ‘ve‘, ‘y‘ 16 , ‘ain‘, ‘aren‘, ‘couldn‘, ‘didn‘, ‘doesn‘, ‘hadn‘, ‘hasn‘, ‘haven‘, ‘isn‘, ‘ma‘ 17 , ‘mightn‘, ‘mustn‘, ‘needn‘, ‘shan‘, ‘shouldn‘, ‘wasn‘, ‘weren‘, ‘won‘, ‘wouldn 18 ‘] 19 >>>

1 >>> 2 >>> def content_fraction(text): 3 ... stopwords=nltk.corpus.stopwords.words(‘english‘) 4 ... content=[w for w in text if w.lower() not in stopwords] 5 ... print (content[:50]) 6 ... return len(content)/len(text) 7 ... 8 >>> content_fraction(nltk.corpus.reuters.words()) 9 [‘ASIAN‘, ‘EXPORTERS‘, ‘FEAR‘, ‘DAMAGE‘, ‘U‘, ‘.‘, ‘.-‘, ‘JAPAN‘, ‘RIFT‘, ‘Mount 10 ing‘, ‘trade‘, ‘friction‘, ‘U‘, ‘.‘, ‘.‘, ‘Japan‘, ‘raised‘, ‘fears‘, ‘among‘, ‘ 11 many‘, ‘Asia‘, "‘", ‘exporting‘, ‘nations‘, ‘row‘, ‘could‘, ‘inflict‘, ‘far‘, ‘- 12 ‘, ‘reaching‘, ‘economic‘, ‘damage‘, ‘,‘, ‘businessmen‘, ‘officials‘, ‘said‘, ‘. 13 ‘, ‘told‘, ‘Reuter‘, ‘correspondents‘, ‘Asian‘, ‘capitals‘, ‘U‘, ‘.‘, ‘.‘, ‘Move 14 ‘, ‘Japan‘, ‘might‘, ‘boost‘, ‘protectionist‘] 15 0.735240435097661 16 >>>
3. 名字语料库

1 >>> names=nltk.corpus.names 2 >>> names.fileids() 3 [‘female.txt‘, ‘male.txt‘] 4 >>> male_name=names.words(‘male.txt‘) 5 >>> female_name=names.words(‘female.txt‘) 6 >>> [w for w in male_name if w in female_name] 7 [‘Abbey‘, ‘Abbie‘, ‘Abby‘, ‘Addie‘, ‘Adrian‘, ‘Adrien‘, ‘Ajay‘, ‘Alex‘, ‘Alexis‘ 8 , ‘Alfie‘, ‘Ali‘, ‘Alix‘, ‘Allie‘, ‘Allyn‘, ‘Andie‘, ‘Andrea‘, ‘Andy‘, ‘Angel‘, 9 ‘Angie‘, ‘Ariel‘, ‘Ashley‘, ‘Aubrey‘, ‘Augustine‘, ‘Austin‘, ‘Averil‘, ‘Barrie‘, 10 ‘Barry‘, ‘Beau‘, ‘Bennie‘, ‘Benny‘, ‘Bernie‘, ‘Bert‘, ‘Bertie‘, ‘Bill‘, ‘Billie 11 ‘, ‘Billy‘, ‘Blair‘, ‘Blake‘, ‘Bo‘, ‘Bobbie‘, ‘Bobby‘, ‘Brandy‘, ‘Brett‘, ‘Britt 12 ‘, ‘Brook‘, ‘Brooke‘, ‘Brooks‘, ‘Bryn‘, ‘Cal‘, ‘Cam‘, ‘Cammy‘, ‘Carey‘, ‘Carlie‘ 13 , ‘Carlin‘, ‘Carmine‘, ‘Carroll‘, ‘Cary‘, ‘Caryl‘, ‘Casey‘, ‘Cass‘, ‘Cat‘, ‘Ceci 14 l‘, ‘Chad‘, ‘Chris‘, ‘Chrissy‘, ‘Christian‘, ‘Christie‘, ‘Christy‘, ‘Clair‘, ‘Cl 15 aire‘, ‘Clare‘, ‘Claude‘, ‘Clem‘, ‘Clemmie‘, ‘Cody‘, ‘Connie‘, ‘Constantine‘, ‘C 16 orey‘, ‘Corrie‘, ‘Cory‘, ‘Courtney‘, ‘Cris‘, ‘Daffy‘, ‘Dale‘, ‘Dallas‘, ‘Dana‘, 17 ‘Dani‘, ‘Daniel‘, ‘Dannie‘, ‘Danny‘, ‘Darby‘, ‘Darcy‘, ‘Darryl‘, ‘Daryl‘, ‘Deane 18 ‘, ‘Del‘, ‘Dell‘, ‘Demetris‘, ‘Dennie‘, ‘Denny‘, ‘Devin‘, ‘Devon‘, ‘Dion‘, ‘Dion 19 is‘, ‘Dominique‘, ‘Donnie‘, ‘Donny‘, ‘Dorian‘, ‘Dory‘, ‘Drew‘, ‘Eddie‘, ‘Eddy‘, 20 ‘Edie‘, ‘Elisha‘, ‘Emmy‘, ‘Erin‘, ‘Esme‘, ‘Evelyn‘, ‘Felice‘, ‘Fran‘, ‘Francis‘, 21 ‘Frank‘, ‘Frankie‘, ‘Franky‘, ‘Fred‘, ‘Freddie‘, ‘Freddy‘, ‘Gabriel‘, ‘Gabriell 22 ‘, ‘Gail‘, ‘Gale‘, ‘Gay‘, ‘Gayle‘, ‘Gene‘, ‘George‘, ‘Georgia‘, ‘Georgie‘, ‘Geri 23 ‘, ‘Germaine‘, ‘Gerri‘, ‘Gerry‘, ‘Gill‘, ‘Ginger‘, ‘Glen‘, ‘Glenn‘, ‘Grace‘, ‘Gr 24 etchen‘, ‘Gus‘, ‘Haleigh‘, ‘Haley‘, ‘Hannibal‘, ‘Harley‘, ‘Hazel‘, ‘Heath‘, ‘Hen 25 rie‘, ‘Hilary‘, ‘Hillary‘, ‘Holly‘, ‘Ike‘, ‘Ikey‘, ‘Ira‘, ‘Isa‘, ‘Isador‘, ‘Isad 26 ore‘, ‘Jackie‘, ‘Jaime‘, ‘Jamie‘, ‘Jan‘, ‘Jean‘, ‘Jere‘, ‘Jermaine‘, ‘Jerrie‘, ‘ 27 Jerry‘, ‘Jess‘, ‘Jesse‘, ‘Jessie‘, ‘Jo‘, ‘Jodi‘, ‘Jodie‘, ‘Jody‘, ‘Joey‘, ‘Jorda 28 n‘, ‘Juanita‘, ‘Jude‘, ‘Judith‘, ‘Judy‘, ‘Julie‘, ‘Justin‘, ‘Karel‘, ‘Kellen‘, ‘ 29 Kelley‘, ‘Kelly‘, ‘Kelsey‘, ‘Kerry‘, ‘Kim‘, ‘Kip‘, ‘Kirby‘, ‘Kit‘, ‘Kris‘, ‘Kyle 30 ‘, ‘Lane‘, ‘Lanny‘, ‘Lauren‘, ‘Laurie‘, ‘Lee‘, ‘Leigh‘, ‘Leland‘, ‘Lesley‘, ‘Les 31 lie‘, ‘Lin‘, ‘Lind‘, ‘Lindsay‘, ‘Lindsey‘, ‘Lindy‘, ‘Lonnie‘, ‘Loren‘, ‘Lorne‘, 32 ‘Lorrie‘, ‘Lou‘, ‘Luce‘, ‘Lyn‘, ‘Lynn‘, ‘Maddie‘, ‘Maddy‘, ‘Marietta‘, ‘Marion‘, 33 ‘Marlo‘, ‘Martie‘, ‘Marty‘, ‘Mattie‘, ‘Matty‘, ‘Maurise‘, ‘Max‘, ‘Maxie‘, ‘Mead 34 ‘, ‘Meade‘, ‘Mel‘, ‘Meredith‘, ‘Merle‘, ‘Merrill‘, ‘Merry‘, ‘Meryl‘, ‘Michal‘, ‘ 35 Michel‘, ‘Michele‘, ‘Mickie‘, ‘Micky‘, ‘Millicent‘, ‘Morgan‘, ‘Morlee‘, ‘Muffin‘ 36 , ‘Nat‘, ‘Nichole‘, ‘Nickie‘, ‘Nicky‘, ‘Niki‘, ‘Nikki‘, ‘Noel‘, ‘Ollie‘, ‘Page‘, 37 ‘Paige‘, ‘Pat‘, ‘Patrice‘, ‘Patsy‘, ‘Pattie‘, ‘Patty‘, ‘Pen‘, ‘Pennie‘, ‘Penny‘ 38 , ‘Perry‘, ‘Phil‘, ‘Pooh‘, ‘Quentin‘, ‘Quinn‘, ‘Randi‘, ‘Randie‘, ‘Randy‘, ‘Ray‘ 39 , ‘Regan‘, ‘Reggie‘, ‘Rene‘, ‘Rey‘, ‘Ricki‘, ‘Rickie‘, ‘Ricky‘, ‘Rikki‘, ‘Robbie 40 ‘, ‘Robin‘, ‘Ronnie‘, ‘Ronny‘, ‘Rory‘, ‘Ruby‘, ‘Sal‘, ‘Sam‘, ‘Sammy‘, ‘Sandy‘, ‘ 41 Sascha‘, ‘Sasha‘, ‘Saundra‘, ‘Sayre‘, ‘Scotty‘, ‘Sean‘, ‘Shaine‘, ‘Shane‘, ‘Shan 42 non‘, ‘Shaun‘, ‘Shawn‘, ‘Shay‘, ‘Shayne‘, ‘Shea‘, ‘Shelby‘, ‘Shell‘, ‘Shelley‘, 43 ‘Sibyl‘, ‘Simone‘, ‘Sonnie‘, ‘Sonny‘, ‘Stacy‘, ‘Sunny‘, ‘Sydney‘, ‘Tabbie‘, ‘Tab 44 by‘, ‘Tallie‘, ‘Tally‘, ‘Tammie‘, ‘Tammy‘, ‘Tate‘, ‘Ted‘, ‘Teddie‘, ‘Teddy‘, ‘Te 45 rri‘, ‘Terry‘, ‘Theo‘, ‘Tim‘, ‘Timmie‘, ‘Timmy‘, ‘Tobe‘, ‘Tobie‘, ‘Toby‘, ‘Tommi 46 e‘, ‘Tommy‘, ‘Tony‘, ‘Torey‘, ‘Trace‘, ‘Tracey‘, ‘Tracie‘, ‘Tracy‘, ‘Val‘, ‘Vale 47 ‘, ‘Valentine‘, ‘Van‘, ‘Vin‘, ‘Vinnie‘, ‘Vinny‘, ‘Virgie‘, ‘Wallie‘, ‘Wallis‘, ‘ 48 Wally‘, ‘Whitney‘, ‘Willi‘, ‘Willie‘, ‘Willy‘, ‘Winnie‘, ‘Winny‘, ‘Wynn‘] 49 >>>

1 >>> cfd=nltk.ConditionalFreqDist( 2 ... (fileid, name[-1]) 3 ... for fileid in names.fileids() 4 ... for name in names.words(fileid)) 5 >>> cfd.tabulate() 6 a b c d e f g h i j k l m n o p r s t u v w x y z 7 8 female.txt 1 1773 9 0 39 1432 2 10 105 317 1 3 179 13 386 33 2 47 93 68 6 2 5 10 461 4 9 10 male.txt 0 29 21 25 228 468 25 32 93 50 3 69 187 70 478 165 18 190 230 164 12 16 17 10 332 11 11 >>> cfd.plot() 12 >>> 13 >>>
显然,大多数以a, e, 或 i 结尾的名字是女性;以h 和 l 结尾的名字男性和女性同样多。
1 >>> 2 >>> entries=nltk.corpus.cmudict.entries() 3 >>> len(entries) 4 133737 5 >>> for entry in entries[39943:39951]: 6 ... print (entry) 7 ... 8 (‘explorer‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘AO1‘, ‘R‘, ‘ER0‘]) 9 (‘explorers‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘AO1‘, ‘R‘, ‘ER0‘, ‘Z‘]) 10 (‘explores‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘AO1‘, ‘R‘, ‘Z‘]) 11 (‘exploring‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘AO1‘, ‘R‘, ‘IH0‘, ‘NG‘]) 12 (‘explosion‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘OW1‘, ‘ZH‘, ‘AH0‘, ‘N‘]) 13 (‘explosions‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘OW1‘, ‘ZH‘, ‘AH0‘, ‘N‘, ‘Z‘]) 14 (‘explosive‘, [‘IH0‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘OW1‘, ‘S‘, ‘IH0‘, ‘V‘]) 15 (‘explosively‘, [‘EH2‘, ‘K‘, ‘S‘, ‘P‘, ‘L‘, ‘OW1‘, ‘S‘, ‘IH0‘, ‘V‘, ‘L‘, ‘IY0‘]) 16 >>>
表格词典的另一个例子是比较词典。nltk中包含了所谓的斯瓦迪士核心词列表(Swadesh wordlists), 包括几种语言的约200个常用词的列表。语言标识符使用ISO639双字母码。
1 >>> from nltk.corpus import swadesh 2 >>> swadesh.fileids() 3 [‘be‘, ‘bg‘, ‘bs‘, ‘ca‘, ‘cs‘, ‘cu‘, ‘de‘, ‘en‘, ‘es‘, ‘fr‘, ‘hr‘, ‘it‘, ‘la‘, ‘mk‘, ‘nl‘, ‘pl‘, ‘pt‘, ‘ro‘, ‘ru‘, ‘sk‘, ‘sl‘, ‘sr‘, ‘sw‘, ‘ 4 uk‘] 5 >>> swadesh.words(‘en‘) 6 [‘I‘, ‘you (singular), thou‘, ‘he‘, ‘we‘, ‘you (plural)‘, ‘they‘, ‘this‘, ‘that‘, ‘here‘, ‘there‘, ‘who‘, ‘what‘, ‘where‘, ‘when‘, ‘how‘, ‘n 7 ot‘, ‘all‘, ‘many‘, ‘some‘, ‘few‘, ‘other‘, ‘one‘, ‘two‘, ‘three‘, ‘four‘, ‘five‘, ‘big‘, ‘long‘, ‘wide‘, ‘thick‘, ‘heavy‘, ‘small‘, ‘short‘ 8 , ‘narrow‘, ‘thin‘, ‘woman‘, ‘man (adult male)‘, ‘man (human being)‘, ‘child‘, ‘wife‘, ‘husband‘, ‘mother‘, ‘father‘, ‘animal‘, ‘fish‘, ‘bir 9 d‘, ‘dog‘, ‘louse‘, ‘snake‘, ‘worm‘, ‘tree‘, ‘forest‘, ‘stick‘, ‘fruit‘, ‘seed‘, ‘leaf‘, ‘root‘, ‘bark (from tree)‘, ‘flower‘, ‘grass‘, ‘rop 10 e‘, ‘skin‘, ‘meat‘, ‘blood‘, ‘bone‘, ‘fat (noun)‘, ‘egg‘, ‘horn‘, ‘tail‘, ‘feather‘, ‘hair‘, ‘head‘, ‘ear‘, ‘eye‘, ‘nose‘, ‘mouth‘, ‘tooth‘, 11 ‘tongue‘, ‘fingernail‘, ‘foot‘, ‘leg‘, ‘knee‘, ‘hand‘, ‘wing‘, ‘belly‘, ‘guts‘, ‘neck‘, ‘back‘, ‘breast‘, ‘heart‘, ‘liver‘, ‘drink‘, ‘eat‘, 12 ‘bite‘, ‘suck‘, ‘spit‘, ‘vomit‘, ‘blow‘, ‘breathe‘, ‘laugh‘, ‘see‘, ‘hear‘, ‘know (a fact)‘, ‘think‘, ‘smell‘, ‘fear‘, ‘sleep‘, ‘live‘, ‘di 13 e‘, ‘kill‘, ‘fight‘, ‘hunt‘, ‘hit‘, ‘cut‘, ‘split‘, ‘stab‘, ‘scratch‘, ‘dig‘, ‘swim‘, ‘fly (verb)‘, ‘walk‘, ‘come‘, ‘lie‘, ‘sit‘, ‘stand‘, ‘ 14 turn‘, ‘fall‘, ‘give‘, ‘hold‘, ‘squeeze‘, ‘rub‘, ‘wash‘, ‘wipe‘, ‘pull‘, ‘push‘, ‘throw‘, ‘tie‘, ‘sew‘, ‘count‘, ‘say‘, ‘sing‘, ‘play‘, ‘flo 15 at‘, ‘flow‘, ‘freeze‘, ‘swell‘, ‘sun‘, ‘moon‘, ‘star‘, ‘water‘, ‘rain‘, ‘river‘, ‘lake‘, ‘sea‘, ‘salt‘, ‘stone‘, ‘sand‘, ‘dust‘, ‘earth‘, ‘c 16 loud‘, ‘fog‘, ‘sky‘, ‘wind‘, ‘snow‘, ‘ice‘, ‘smoke‘, ‘fire‘, ‘ashes‘, ‘burn‘, ‘road‘, ‘mountain‘, ‘red‘, ‘green‘, ‘yellow‘, ‘white‘, ‘black‘ 17 , ‘night‘, ‘day‘, ‘year‘, ‘warm‘, ‘cold‘, ‘full‘, ‘new‘, ‘old‘, ‘good‘, ‘bad‘, ‘rotten‘, ‘dirty‘, ‘straight‘, ‘round‘, ‘sharp‘, ‘dull‘, ‘smo 18 oth‘, ‘wet‘, ‘dry‘, ‘correct‘, ‘near‘, ‘far‘, ‘right‘, ‘left‘, ‘at‘, ‘in‘, ‘with‘, ‘and‘, ‘if‘, ‘because‘, ‘name‘] 19 >>>
1 >>> fr2en=swadesh.entries([‘fr‘, ‘en‘]) 2 >>> fr2en 3 [(‘je‘, ‘I‘), (‘tu, vous‘, ‘you (singular), thou‘), (‘il‘, ‘he‘), (‘nous‘, ‘we‘), (‘vous‘, ‘you (plural)‘), (‘ils, elles‘, ‘they‘), (‘ceci‘, 4 ‘this‘), (‘cela‘, ‘that‘), (‘ici‘, ‘here‘), (‘là‘, ‘there‘), (‘qui‘, ‘who‘), (‘quoi‘, ‘what‘), (‘où‘, ‘where‘), (‘quand‘, ‘when‘), (‘commen 5 t‘, ‘how‘), (‘ne...pas‘, ‘not‘), (‘tout‘, ‘all‘), (‘plusieurs‘, ‘many‘), (‘quelques‘, ‘some‘), (‘peu‘, ‘few‘), (‘autre‘, ‘other‘), (‘un‘, ‘o 6 ne‘), (‘deux‘, ‘two‘), (‘trois‘, ‘three‘), (‘quatre‘, ‘four‘), (‘cinq‘, ‘five‘), (‘grand‘, ‘big‘), (‘long‘, ‘long‘), (‘large‘, ‘wide‘), (‘ép 7 ais‘, ‘thick‘), (‘lourd‘, ‘heavy‘), (‘petit‘, ‘small‘), (‘court‘, ‘short‘), (‘étroit‘, ‘narrow‘), (‘mince‘, ‘thin‘), (‘femme‘, ‘woman‘), (‘h 8 omme‘, ‘man (adult male)‘), (‘homme‘, ‘man (human being)‘), (‘enfant‘, ‘child‘), (‘femme, épouse‘, ‘wife‘), (‘mari, époux‘, ‘husband‘), (‘mè 9 re‘, ‘mother‘), (‘père‘, ‘father‘), (‘animal‘, ‘animal‘), (‘poisson‘, ‘fish‘), (‘oiseau‘, ‘bird‘), (‘chien‘, ‘dog‘), (‘pou‘, ‘louse‘), (‘ser 10 pent‘, ‘snake‘), (‘ver‘, ‘worm‘), (‘arbre‘, ‘tree‘), (‘forêt‘, ‘forest‘), (‘b\\xe2ton‘, ‘stick‘), (‘fruit‘, ‘fruit‘), (‘graine‘, ‘seed‘), (‘f 11 euille‘, ‘leaf‘), (‘racine‘, ‘root‘), (‘écorce‘, ‘bark (from tree)‘), (‘fleur‘, ‘flower‘), (‘herbe‘, ‘grass‘), (‘corde‘, ‘rope‘), (‘peau‘, ‘ 12 skin‘), (‘viande‘, ‘meat‘), (‘sang‘, ‘blood‘), (‘os‘, ‘bone‘), (‘graisse‘, ‘fat (noun)‘), (‘\\u0153uf‘, ‘egg‘), (‘corne‘, ‘horn‘), (‘queue‘, 13 ‘tail‘), (‘plume‘, ‘feather‘), (‘cheveu‘, ‘hair‘), (‘tête‘, ‘head‘), (‘oreille‘, ‘ear‘), (‘\\u0153il‘, ‘eye‘), (‘nez‘, ‘nose‘), (‘bouche‘, ‘m 14 outh‘), (‘dent‘, ‘tooth‘), (‘langue‘, ‘tongue‘), (‘ongle‘, ‘fingernail‘), (‘pied‘, ‘foot‘), (‘jambe‘, ‘leg‘), (‘genou‘, ‘knee‘), (‘main‘, ‘h 15 and‘), (‘aile‘, ‘wing‘), (‘ventre‘, ‘belly‘), (‘entrailles‘, ‘guts‘), (‘cou‘, ‘neck‘), (‘dos‘, ‘back‘), (‘sein, poitrine‘, ‘breast‘), (‘c\\u0 16 153ur‘, ‘heart‘), (‘foie‘, ‘liver‘), (‘boire‘, ‘drink‘), (‘manger‘, ‘eat‘), (‘mordre‘, ‘bite‘), (‘sucer‘, ‘suck‘), (‘cracher‘, ‘spit‘), (‘vo 17 mir‘, ‘vomit‘), (‘souffler‘, ‘blow‘), (‘respirer‘, ‘breathe‘), (‘rire‘, ‘laugh‘), (‘voir‘, ‘see‘), (‘entendre‘, ‘hear‘), (‘savoir‘, ‘know (a 18 fact)‘), (‘penser‘, ‘think‘), (‘sentir‘, ‘smell‘), (‘craindre, avoir peur‘, ‘fear‘), (‘dormir‘, ‘sleep‘), (‘vivre‘, ‘live‘), (‘mourir‘, ‘di 19 e‘), (‘tuer‘, ‘kill‘), (‘se battre‘, ‘fight‘), (‘chasser‘, ‘hunt‘), (‘frapper‘, ‘hit‘), (‘couper‘, ‘cut‘), (‘fendre‘, ‘split‘), (‘poignarder 20 ‘, ‘stab‘), (‘gratter‘, ‘scratch‘), (‘creuser‘, ‘dig‘), (‘nager‘, ‘swim‘), (‘voler‘, ‘fly (verb)‘), (‘marcher‘, ‘walk‘), (‘venir‘, ‘come‘), 21 ("s‘étendre", ‘lie‘), ("s‘asseoir", ‘sit‘), (‘se lever‘, ‘stand‘), (‘tourner‘, ‘turn‘), (‘tomber‘, ‘fall‘), (‘donner‘, ‘give‘), (‘tenir‘, ‘h 22 old‘), (‘serrer‘, ‘squeeze‘), (‘frotter‘, ‘rub‘), (‘laver‘, ‘wash‘), (‘essuyer‘, ‘wipe‘), (‘tirer‘, ‘pull‘), (‘pousser‘, ‘push‘), (‘jeter‘, 23 ‘throw‘), (‘lier‘, ‘tie‘), (‘coudre‘, ‘sew‘), (‘compter‘, ‘count‘), (‘dire‘, ‘say‘), (‘chanter‘, ‘sing‘), (‘jouer‘, ‘play‘), (‘flotter‘, ‘fl 24 oat‘), (‘couler‘, ‘flow‘), (‘geler‘, ‘freeze‘), (‘gonfler‘, ‘swell‘), (‘soleil‘, ‘sun‘), (‘lune‘, ‘moon‘), (‘étoile‘, ‘star‘), (‘eau‘, ‘wate 25 r‘), (‘pluie‘, ‘rain‘), (‘rivière‘, ‘river‘), (‘lac‘, ‘lake‘), (‘mer‘, ‘sea‘), (‘sel‘, ‘salt‘), (‘pierre‘, ‘stone‘), (‘sable‘, ‘sand‘), (‘po 26 ussière‘, ‘dust‘), (‘terre‘, ‘earth‘), (‘nuage‘, ‘cloud‘), (‘brouillard‘, ‘fog‘), (‘ciel‘, ‘sky‘), (‘vent‘, ‘wind‘), (‘neige‘, ‘snow‘), (‘gl 27 ace‘, ‘ice‘), (‘fumée‘, ‘smoke‘), (‘feu‘, ‘fire‘), (‘cendres‘, ‘ashes‘), (‘br\\xfbler‘, ‘burn‘), (‘route‘, ‘road‘), (‘montagne‘, ‘mountain‘), 28 (‘rouge‘, ‘red‘), (‘vert‘, ‘green‘), (‘jaune‘, ‘yellow‘), (‘blanc‘, ‘white‘), (‘noir‘, ‘black‘), (‘nuit‘, ‘night‘), (‘jour‘, ‘day‘), (‘an, 29 année‘, ‘year‘), (‘chaud‘, ‘warm‘), (‘froid‘, ‘cold‘), (‘plein‘, ‘full‘), (‘nouveau‘, ‘new‘), (‘vieux‘, ‘old‘), (‘bon‘, ‘good‘), (‘mauvais‘, 30 ‘bad‘), (‘pourri‘, ‘rotten‘), (‘sale‘, ‘dirty‘), (‘droit‘, ‘straight‘), (‘rond‘, ‘round‘), (‘tranchant, pointu, aigu‘, ‘sharp‘), (‘émoussé‘ 31 , ‘dull‘), (‘lisse‘, ‘smooth‘), (‘mouillé‘, ‘wet‘), (‘sec‘, ‘dry‘), (‘juste, correct‘, ‘correct‘), (‘proche‘, ‘near‘), (‘loin‘, ‘far‘), (‘à 32 droite‘, ‘right‘), (‘à gauche‘, ‘left‘), (‘à‘, ‘at‘), (‘dans‘, ‘in‘), (‘avec‘, ‘with‘), (‘et‘, ‘and‘), (‘si‘, ‘if‘), (‘parce que‘, ‘because‘ 33 ), (‘nom‘, ‘name‘)] 34 >>> translate=dict(fr2en) 35 >>> translate[‘chien‘] 36 ‘dog‘ 37 >>> translate[‘jeter‘] 38 ‘throw‘ 39 >>> 40 >>> de2en=swadesh.entries([‘de‘, ‘en‘]) 41 >>> es2en=swadesh.entries([‘es‘, ‘en‘]) 42 >>> translate.update(dict(de2en)) 43 >>> translate.update(dict(es2en)) 44 >>> translate[‘Hund‘] 45 ‘dog‘ 46 >>> translate[‘perro‘] 47 ‘dog‘ 48 >>> translate[‘jeter‘] 49 ‘throw‘ 50 >>>
5.词汇工具:Toolbox 和 Shoebox

1 >>> from nltk.corpus import toolbox 2 >>> dic1=toolbox.entries(‘rotokas.dic‘) 3 >>> dic1[:20] 4 [(‘kaa‘, [(‘ps‘, ‘V‘), (‘pt‘, ‘A‘), (‘ge‘, ‘gag‘), (‘tkp‘, ‘nek i pas‘), (‘dcsv‘, ‘true‘), (‘vx‘, ‘1‘), (‘sc‘, ‘???‘), (‘dt‘, ‘29/Oct/2005‘) 5 , (‘ex‘, ‘Apoka ira kaaroi aioa-ia reoreopaoro.‘), (‘xp‘, ‘Kaikai i pas long nek bilong Apoka bikos em i kaikai na toktok.‘), (‘xe‘, ‘Apoka 6 is gagging from food while talking.‘)]), (‘kaa‘, [(‘ps‘, ‘V‘), (‘pt‘, ‘B‘), (‘ge‘, ‘strangle‘), (‘tkp‘, ‘pasim nek‘), (‘arg‘, ‘O‘), (‘vx‘, ‘ 7 2‘), (‘dt‘, ‘07/Oct/2006‘), (‘ex‘, ‘Rera rauroro rera kaarevoi.‘), (‘xp‘, ‘Em i holim pas em na nekim em.‘), (‘xe‘, ‘He is holding him and s 8 trangling him.‘), (‘ex‘, ‘Iroiro-ia oirato okoearo kaaivoi uvare rirovira kaureoparoveira.‘), (‘xp‘, ‘Ol i pasim nek bilong man long rop bik 9 os em i save bikhet tumas.‘), (‘xe‘, "They strangled the man‘s neck with rope because he was very stubborn and arrogant."), (‘ex‘, ‘Oirato o 10 koearo kaaivoi iroiro-ia. Uva viapau uvuiparoi ra vovouparo uva kopiiroi.‘), (‘xp‘, ‘Ol i pasim nek bilong man long rop. Olsem na em i no pu 11 lim win olsem na em i dai.‘), (‘xe‘, "They strangled the man‘s neck with a rope. And he couldn‘t breathe and he died.")]), (‘kaa‘, [(‘ps‘, ‘ 12 N‘), (‘pt‘, ‘MASC‘), (‘cl‘, ‘isi‘), (‘ge‘, ‘cooking banana‘), (‘tkp‘, ‘banana bilong kukim‘), (‘pt‘, ‘itoo‘), (‘sf‘, ‘FLORA‘), (‘dt‘, ‘12/Au 13 g/2005‘), (‘ex‘, ‘Taeavi iria kaa isi kovopaueva kaparapasia.‘), (‘xp‘, ‘Taeavi i bin planim gaden banana bilong kukim tasol long paia.‘), ( 14 ‘xe‘, ‘Taeavi planted banana in order to cook it.‘)]), (‘kaakaaro‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘NT‘), (‘ge‘, ‘mixture‘), (‘tkp‘, ‘???‘), (‘eng‘, ‘ 15 mixtures‘), (‘eng‘, ‘charm used to keep married men and women youthful and attractive‘), (‘cmt‘, ‘Check vowel length. Is it kaakaaro or kaak 16 aro? Does lexeme have suffix, -aro or -ro?‘), (‘dt‘, ‘20/Nov/2006‘), (‘ex‘, ‘Kaakaroto ira purapaiveira aue iava opita, voeao-pa airepa orao 17 uirara, ra va aiopaive.‘), (‘xp‘, ‘Kokonas ol i save wokim long ol kain samting bilong ol nupela marit, bai ol i ken kaikai.‘), (‘xe‘, ‘Mixt 18 ures are made from coconut for newlyweds, who eat them.‘)]), (‘kaakaaviko‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘FEM‘), (‘ge‘, ‘type of beetle‘), (‘tkp‘, ‘ 19 ???‘), (‘nt‘, ‘round beetle like Mexican bean beetle‘), (‘dt‘, ‘10/Feb/2005‘), (‘sf‘, ‘FAUNA.INSECT‘), (‘ex‘, ‘Kaakaaviko kare oea binara to 20 uaveira vara tapo piupaiveira.‘), (‘xp‘, ‘Kaakaaviko em i wanpela kain insect em i save istap long ol bin or na long kain lip.‘), (‘xe‘, ‘?? 21 ?‘), (‘ex‘, ‘Kaakaaviko kare oea raviriro kouro piupaiveira.‘), (‘xp‘, ‘Em i wanpela kain weevil i save bagarapim ol bin.‘), (‘xe‘, ‘??? dam 22 ages up beans.‘)]), (‘kaakaavo‘, [(‘rt‘, ‘kaavo‘), (‘ps‘, ‘???‘), (‘rdp‘, ‘partial‘), (‘ge‘, ‘white‘), (‘tkp‘, ‘wait‘), (‘sc‘, ‘???‘), (‘cmt 23 ‘, "What‘s the part of speech?"), (‘dt‘, ‘29/Oct/2005‘), (‘ex‘, ‘Kaakaaro oa purapaiveira varauraro tokipasia aue iava opita ora vegoara iav 24 a oirara iava ora riakova kaakaaro.‘), (‘xp‘, ‘Ol i save wokim out long kokonas coconut na ol lip na skin blong ol diwai.‘), (‘xe‘, ‘???‘), 25 (‘ex‘, ‘Varoa kaakaavopa popotepa ragai varo.‘), (‘xp‘, ‘Em white lap lap blong mi.‘), (‘xe‘, "That‘s my white laplap."), (‘ex‘, ‘Vaoia evao 26 va kaakaavopaova.‘), (‘xp‘, ‘Dispela diwai em i waitpela.‘), (‘xe‘, ‘This tree is white.‘), (‘ex‘, ‘Rarasoria kaakaavoto ira Amerika iava ur 27 ioroera vo kovosia rupairara voaro.‘), (‘xp‘, ‘Rarason em i wait man em i bin kam long Amerika na kam wok long hap bilong ol bilak man.‘), ( 28 ‘xe‘, ‘Rarason is a white man who came from America ???.‘)]), (‘kaakaoko‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘???‘), (‘ge‘, ‘type of beetle‘), (‘tkp‘, ‘b 29 inatang‘), (‘sf‘, ‘FAUNA.INSECT‘), (‘cmt‘, ‘Is it kaakaoko or kaakauko?‘), (‘dt‘, ‘08/Feb/2005‘), (‘ex‘, ‘Kaakaoko vuri gesito./Kaakauko vur 30 isi gesiva.‘), (‘xp‘, ‘???‘), (‘xe‘, ‘Kaakauko em i wanpela binatang.‘)]), (‘kaakasi‘, [(‘rt‘, ‘???‘), (‘ps‘, ‘V‘), (‘pt‘, ‘A‘), (‘ge‘, ‘hot 31 ‘), (‘tkp‘, ‘hot‘), (‘vx‘, ‘1‘), (‘sc‘, ‘???‘), (‘cmt‘, "Vowel length can‘t possibly be right. Or is the vowel of kaasi long?"), (‘dt‘, ‘29/ 32 Oct/2005‘), (‘ex‘, ‘Upiriko pitoka kaakasipai.‘), (‘xp‘, ‘Sospen kaukau em i hot tru.‘), (‘xe‘, ‘The saucepan of sweet potatos is really hot 33 .‘), (‘ex‘, ‘Kaukau pitoka rirovira rutu kaakasipai uvare riro kasia tuitui kasi oripiro.‘), (‘xp‘, ‘Sospen kaukau em i hot tru bikos em i t 34 an long bikpela paia.‘), (‘xe‘, ‘???‘)]), (‘kaakau‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘FEM‘), (‘ge‘, ‘dog‘), (‘tkp‘, ‘dok‘), (‘dt‘, ‘17/Jul/2005‘), (‘ex 35 ‘, ‘Kaakau voresiurava toupa aue kokoto ora kokopi.‘), (‘xp‘, ‘Dog i gat fopela lek bilong em na em i teleblonge.‘), (‘xe‘, ‘Dogs are four-f 36 ooted ???.‘), (‘ex‘, ‘Revisa riro kaakau raguito.‘), (‘xp‘, ‘Revisa em i man bilong lukautim dok.‘), (‘xe‘, ‘Revisa is a big dog lover.‘), ( 37 ‘ex‘, ‘Rake ora Jon kaakau kare ousia avasie.‘), (‘xp‘, ‘Rake wantaim Jon ol i go kisim ol wail dok.‘), (‘xe‘, ‘Rake and John went to get wi 38 ld dogs.‘)]), (‘kaakauko‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘MASC‘), (‘ge‘, ‘gray weevil‘), (‘tkp‘, ‘wanpela kain binatang‘), (‘sf‘, ‘FAUNA.INSECT‘), (‘ 39 nt‘, ‘pictured on PNG postage stamp‘), (‘dt‘, ‘29/Oct/2005‘), (‘ex‘, ‘Kaakauko ira toupareveira aue-ia niugini stemp.‘), (‘xp‘, ‘Kaakauko em 40 insect em i istap long niugini.‘), (‘xe‘, ‘The gray weevil is found on the PNG stamp.‘), (‘ex‘, ‘Kaakauko iria toupaeveira niugini stamia.‘ 41 ), (‘xp‘, ‘Weevil i stap long niguini stamp.‘), (‘xe‘, ‘The gray weevil is on the New Guinea stamp.‘), (‘ex‘, ‘Kaakauko korekare iava oira i 42 ria iava varaua vurivurivira ora kaapovira toupaiveira.‘), (‘xp‘, ‘Kaakavuko em i wanpela kain binatang skin bilong em i braun na wait.‘), ( 43 ‘xe‘, ‘Kaakavuko is an insect whose body is brown and white.‘)]), (‘kaakito‘, [(‘rt‘, ‘kaaki‘), (‘ps‘, ‘N‘), (‘pt‘, ‘HUM‘), (‘ge‘, ‘person b 44 lind with cataracts‘), (‘tkp‘, ‘man i gat wanpela ei‘), (‘nt‘, ‘nickname when used to describe one-eyed person‘), (‘dt‘, ‘11/Feb/2005‘), (‘e 45 x‘, ‘Rarasirea kakito eisiva rera Tavusiva uruiia.‘), (‘xp‘, ‘Rarasirea em i wan ai man bilong ples Tavusiova.‘), (‘xe‘, ‘Rarasirea is a one 46 -eyed man from Tavusiova village.‘), (‘ex‘, ‘Kaakito kataitoa iava osireito vurapare.‘), (‘xp‘, ‘Man i gat wanpela ei na i lukluk.‘), (‘xe‘, 47 ‘A one-eyed man looks out of one eye.‘)]), (‘kaakuupato‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘PN‘), (‘ge‘, ‘spring of hot mineral water near Togarao.‘), 48 (‘tkp‘, ‘???‘), (‘nt‘, ‘It is located in gulley above the shorter waterfall and is most likely ???.‘), (‘dt‘, ‘08/Feb/2005‘), (‘ex‘, ‘Kasira 49 opato kaakuupato uicoto ira vusivusipareveira vova rasito vo toupare togarao-ia sisiupaveira vosa upiapave ora ruvapasa.‘), (‘xp‘, ‘Kaakuupa 50 to em i spirins hot water em i stap long Togavao taim husat i sik bai ol waswas long bai sick br pinis .‘), (‘xe‘, ‘???‘), (‘ex‘, ‘Kaakuupat 51 o kasiraopato ukoto ira toupare eisi Rureva Togaraoia ruvaraia.‘), (‘xp‘, ‘Hot wara kaakuupato i stap long Rureva klostu long Togarao.‘), (‘ 52 xe‘, ‘The hot spring Kaakuupato is in Rureva near Togarao.‘)]), (‘kaaova‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘FEM‘), (‘ge‘, ‘aunt‘), (‘tkp‘, ‘???‘), (‘nt 53 ‘, ‘FaSi‘), (‘sf‘, ‘KIN‘), (‘dt‘, ‘19/Jul/2004‘)]), (‘kaapa‘, [(‘ps‘, ‘N‘), (‘pt‘, ‘???‘), (‘ge‘, ‘copper metal‘), (‘tkp‘, ‘retpela ain‘), ( 54 ‘dt‘, ‘12/Feb/2005‘), (‘cmt‘, ‘What is paupara doing in the second example?‘), (‘ex‘, ‘Kaapa vao oa-ia kepa paupaviei.‘), (‘xp‘, ‘Kaapa em i 55 roof yumi save wokim haus long em.‘), (‘xe‘, ‘Copper we make houses from.‘), (‘ex‘, ‘Kaapara kepa paupara oara purapaiveira eisi Astararia. 56 ‘), (‘xp‘, ‘Kapa bilong wokim haus ol i save wokim long Australia.‘), (‘xe‘, ‘Copper rooves, they make them in Australia.‘)]), (‘kaapea‘, [( 57 ‘ps‘, ‘???‘), (‘ge‘, ‘weak‘), (‘ge‘, ‘loose‘), (‘ge‘, ‘easy‘), (‘tkp‘, ‘???‘), (‘cmt‘, ‘Check spelling. Is it kaapea or kapea?‘), (‘dt‘, ‘03 58 /Jun/2005‘), (‘ex‘, ‘Kaapeta virago vao paupa.‘), (‘xp‘, ‘Dispela chair i no strong bilong sindaun.‘), (‘xe‘, ‘???‘)]), (‘kaapie‘, [(‘rt‘, ‘ 59 kaa‘), (‘ps‘, ‘N‘), (‘pt‘, ‘MASC‘), (‘ge‘, ‘hook‘), (‘ge‘, ‘fishhook‘), (‘tkp‘, ‘huk‘), (‘dt‘, ‘15/Feb/2004‘)]), (‘kaapie‘, [(‘rt‘, ‘kaa‘), 60 (‘ps‘, ‘V‘), (‘pt‘, ‘B‘), (‘ge‘, ‘hook‘), (‘eng‘, ‘choke‘), (‘eng‘, ‘snag‘), (‘eng‘, ‘hook‘), (‘ge‘, ‘capture‘), (‘tkp‘, ‘hukim‘), (‘tkp‘, ‘ 61 pasim long huk‘), (‘arg‘, ‘O‘), (‘vx‘, ‘2‘), (‘dt‘, ‘15/Nov/2005‘), (‘cmt‘, "Double-check vowel length of kaa. First example doesn‘t make se 62 nse. Is it two sentences?"), (‘ex‘, ‘Aiopaoro karoi kakaeto kaapierivoi aioa-ia.‘), (‘xp‘, ‘Kaikai pas long nek bilong mi kaikai i pas long 63 pikinini.‘), (‘xe‘, ‘???‘), (‘ex‘, ‘Koie kaapierevo Ririre ovare oira gisipoaro iare karuveraisi vikirevo.‘), (‘xp‘, ‘Ririre i tromoem singa 64 po insait long maus bilong pik na i pas.‘), (‘xe‘, ‘Ririre ???.‘), (‘ex‘, ‘Aakova kakaeto kapieevoi aioa-ia.‘), (‘xp‘, ‘Mama i givim kaikai 65 long pikinini na hap kaikai i pas long nek bilong em.‘), (‘xe‘, ‘Mother made the boy choke with some food.‘), (‘ex‘, ‘Aakova kakaeto kaapiev 66 oi aioa-ia uvare viapau vearovira va orievo.‘), (‘xp‘, ‘Mama em i mekim pas kaikai long pikinini bikos em i no kukim gut.‘), (‘xe‘, "Mother 67 made the boy choke from the food because she didn‘t cook it well."), (‘ex‘, ‘Avuka kakaeto aiopiepaoro rera kaapieevo.‘), (‘xp‘, ‘Lapun wok 68 long givim kaikai long bebe na kaikai i pas long nek.‘), (‘xe‘, ‘The old person fed the boy and made it choke.‘)]), (‘kaapiepato‘, [(‘rt‘, ‘ 69 kaapie‘), (‘ps‘, ‘N‘), (‘pt‘, ‘HUM‘), (‘ge‘, ‘fisher‘), (‘tkp‘, ‘man bilong hukim pis‘), (‘dt‘, ‘12/Feb/2005‘), (‘ex‘, ‘Aveatoa atari kapiep 70 ato vokiara rutu.‘), (‘xp‘, ‘Aveato em i man bilong hukim pis olgeta de.‘), (‘xe‘, ‘Aveato works as a fisherman every day.‘)]), (‘kaapisi‘, 71 [(‘ps‘, ‘V‘), (‘pt‘, ‘B‘), (‘ge‘, ‘pinch together‘), (‘ge‘, ‘grip with pincers‘), (‘tkp‘, ‘holim‘), (‘arg‘, ‘O‘), (‘vx‘, ‘2‘), (‘dt‘, ‘08/Ju 72 n/2005‘), (‘ex‘, ‘Kaapisi ava eva ra avekeara kasiraopa ra kaekaepiea. Ra varao vera oara kasiraopai.‘), (‘xp‘, ‘Yu mas kam wantam sisis pin 73 vh bar mi ya rausim ol dispela pela stow ol bai mi rausim ol dispela i hot.‘), (‘xe‘, ‘???‘), (‘ex‘, ‘Avekeara kaapisi evara kasiraopara.‘), 74 (‘xp‘, ‘Yu rausim ol ton i hot long pansa.‘), (‘xe‘, ‘???‘)]), (‘kaapisivira‘, [(‘rt‘, ‘kaapisi‘), (‘ps‘, ‘ADV‘), (‘pt‘, ‘MANNER‘), (‘ge‘, 75 ‘linked‘), (‘ge‘, ‘pinched‘), (‘tkp‘, ‘???‘), (‘dt‘, ‘29/Oct/2005‘), (‘ex‘, ‘Auea eva oa kaapisivira toupaivoi.‘), (‘xp‘, ‘Samting i stap ol 76 sem pansa.‘), (‘xe‘, ‘???‘), (‘ex‘, ‘Pariearei tapokovira toupai uva kaapisivira kekepapiroi.‘), (‘xp‘, ‘Hap mambu i pas wantaim na i luk ol 77 sem sises.‘), (‘xe‘, ‘???‘)])] 78 >>>
只看第一个条目,词kaa,意思是“窒息”。条目由一系列的”属性-值”对组成,如(‘ps‘, ‘V‘),表示词性是‘V‘(动词),(‘ge‘, ‘gag‘)表示英文注释是‘gag’。最后的三个配对包含一个罗托卡特语例句及其巴布亚皮钦语和英语的翻译。
罗托卡特语是巴布亚新几内亚的布干维尔岛上使用 的一种语言,这个词典资源有Stusrt Robinson贡献给nltk。罗托卡特语以仅有12个音素(彼此对立的声音)而闻名。
