python 获取只收集一种单词的标签

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 获取只收集一种单词的标签相关的知识,希望对你有一定的参考价值。

def get_tags(text, limit_common, stype = 'NN'):
    import nltk
    import numpy as np
    import pandas as pd
    import copy
    
    # input
    sinput = copy.deepcopy(test.lower().replace('\n',' '))
    
    ## TOKENIZE, SINGULAR/PLURAL VALIDATION and CALCULATE common index
    tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+')
    lemmatizer = nltk.stem.WordNetLemmatizer()
    tokens = tokenizer.tokenize(sinput)
    lemmas = [lemmatizer.lemmatize(t) for t in tokens]
    fdist = nltk.FreqDist(lemmas)
    common = fdist.most_common(100000000) 

    
    ## FILTER BY TYPE OF WORD for len(words)>1
    tagged = nltk.pos_tag([ic[0] for ic in common if len(ic)>1])
    # only nouns
    nouns = [inn for inn in tagged if stype in inn[1]]
    lnouns = [inn[0] for inn in nouns]
    # update common
    common = [icc for icc in common if icc[0] in lnouns] 

    ## store into df and return
    FINAL = pd.DataFrame(np.array(common))
    FINAL.columns = ['tag','rank']
    FINAL['rank'] = FINAL['rank'].astype(int) 
    FINAL.sort(['rank'], ascending=[0], inplace=True)
    FINAL.reset_index(drop=True,inplace=True)
    return FINAL[:limit_common]

以上是关于python 获取只收集一种单词的标签的主要内容,如果未能解决你的问题,请参考以下文章

使用python获取不同语言的单词列表

如何仅在标签中的特定单词上显示工具提示?

使用正则表达式获取标签中的第一个单词

如何在GIT中按名称搜索标签?

自然语言处理----词袋模型

使用 JSON/Python 收集信息的 Discord 机器人显示相同的数据