python nltk 模拟退火分词

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python nltk 模拟退火分词相关的知识,希望对你有一定的参考价值。

#!/usr/bin/python
import nltk
from random import randint


def segment(text, segs):
    # 分词
    words = []
    last = 0
    for i in range(len(segs)):
        if segs[i] == 1:
            words.append(text[last:i+1])
            last = i+1
    words.append(text[last:])
    return words

def evaluate(text, segs):
    # 评分
    words = segment(text, segs)
    text_size = len(words)
    lexicon_size = sum(len(word) + 1 for word in set(words))
    return text_size + lexicon_size

def flip(segs, pos):
    return segs[:pos] + str(1-int(segs[pos])) + segs[pos+1:]

def flip_n(segs, n):
    # 随机扰动
    for i in range(n):
        segs = flip(segs, randint(0, len(segs)-1))
    return segs

def anneal(text, segs, iterations, cooling_rate):
    temperature = float(len(segs))
    while temperature > 0.5:
        # 退货:降低评分,优化分词结果
        best_segs, best = segs, evaluate(text, segs)
        for i in range(iterations):
            guess = flip_n(segs, int(round(temperature)))
            score = evaluate(text, guess)
            if score < best:
                best, best_segs = score, guess
        score, segs = best, best_segs
        temperature = temperature / cooling_rate
        print(evaluate(text, segs), segment(text, segs))
    print()
    return segs

if __name__ == __main__:
    text = "doyouseethekittyseethedoggydoyoulikethekittylikethedoggy"
    seg1 = "0000000000000001000000000010000000000000000100000000000"
    anneal(text, seg1, 500, 1.2)

 

以上是关于python nltk 模拟退火分词的主要内容,如果未能解决你的问题,请参考以下文章

nltk分词

如何用 Python 中的 NLTK 对中文进行分析和处理

Python使用模拟退火(Simulated Annealing)算法构建优化器获取机器学习模型最优超参数组合(hyperparameter)实战+代码

Python数模笔记-模拟退火算法求解旅行商问题的联合算子模拟退火算法

Python数模笔记-模拟退火算法求解旅行商问题的联合算子模拟退火算法

python模拟退火(Simulated Annealing)参数寻优实战