我的 Bleu 分数与 nltk bleu 分数不同

Posted 2023-03-29

技术标签:

【中文标题】我的 Bleu 分数与 nltk bleu 分数不同【英文标题】：My Bleu score is different from nltk bleu score 【发布时间】：2021-11-09 16:26:11 【问题描述】：

我尝试从头开始计算 bleu 分数。

import numpy as np
reference = 'hello how are you i am good here'
output = 'hello baby are you i am fine here'

# calculate Brevity penalty
BP = 0
if len(reference) < len(output):
    BP = 1
else:
    BP = np.exp(1-(len(reference)/len(output)))

def Bleu(ref, pred):
    count = []
    clip_count = []

    for i in range(1, len(pred)):
        clp = 0
        cp = 0
        start = set()
        for j in range(len(pred)):
            if j+i >len(pred):
                continue

            goal = pred[j:i+j]

            sum = ''
            for k in goal:
                sum += k+' '

            final = sum[:-1]

            cp += 1
            if final in ref:
                if final in start:
                    continue
                else:
                    clp += 1
                    start.add(final)


        clip_count.append(clp)
        count.append(cp)

    return clip_count, count

clip, count = Bleu(reference, output.split())

pn = sum(np.divide(clip, count))

bleu = np.exp((1/len(clip)) * pn) * BP

print(bleu)

nltk python Bleu 分数的输出

import nltk

t = 'hello how are you i am good here'
m = 'hello baby are you i am fine here'
hypothesis = m.split()
reference = t.split()
#there may be several references
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(BLEUscore)

我的问题是：

第一季度。两个 bleu 分数不匹配，是什么错误？有人请帮帮我

第二季度。如果我们尝试计算 Bleu score，bleu score 的值总是大于 1，因为 bleu score 的公式是

Bleu score-> exp( 1/n * sum(precision n-gram) ) * Brevity_Penalty

如果 x 为 +ve，则指数函数 (e^x) 始终大于 1，并且精度 n-gram 的值始终为正。

那为什么一般文献都说bleu score值应该在0到1之间？？？

【问题讨论】：

【参考方案1】：

公式有误。将平均 n-gram 精度指数化不会有任何合理的解释。它应该是geometric mean。 0 到 1 之间数字的几何平均值将始终介于 0 和 1 之间。通常计算它的方式是取对数精度的平均值，否则，您将乘以可能导致浮点下溢错误的小数字。

这是来自original paper的公式：

【讨论】：

感谢您的回答，很有帮助。如果你认为答案是正确的，请标记它，以便其他有类似问题的人知道。我会，但它显示“感谢您的反馈！您需要至少 15 声望才能投票，但您的反馈已被记录。”。

以上是关于我的 Bleu 分数与 nltk bleu 分数不同的主要内容，如果未能解决你的问题，请参考以下文章