mteval-v13a.pl 和 NLTK BLEU 有啥区别？

Posted 2023-03-29

技术标签:

【中文标题】mteval-v13a.pl 和 NLTK BLEU 有啥区别？【英文标题】：What is the difference between mteval-v13a.pl and NLTK BLEU?mteval-v13a.pl 和 NLTK BLEU 有什么区别？ 【发布时间】：2018-02-15 11:55:35 【问题描述】：

在 Python NLTK 中有一个 BLEU score 的实现， nltk.translate.bleu_score.corpus_bleu

但我不确定它是否与mtevalv13a.pl script相同。

它们有什么区别？

【问题讨论】：

你为什么不比较它们呢？如果您对两种语言都不够了解，至少编辑您的问题并提供两种实现的链接。请参阅网站的帮助部分，了解如何编写能获得答案的好问题。 【参考方案1】：

TL;DR

在评估机器翻译系统时使用https://github.com/mjpost/sacrebleu。

简而言之

不，NLTK 中的 BLEU 与 mteval-13a.perl 并不完全相同。

但它可以非常接近，请参阅https://github.com/nltk/nltk/issues/1330#issuecomment-256237324

nltk.translate.corpus_bleu 对应于 mteval-13a.pl 最高 ngram 的 4 阶，但存在一些浮点差异

对比详情及使用的数据集可从https://github.com/nltk/nltk_data/blob/gh-pages/packages/models/wmt15_eval.zip下载或：

import nltk
nltk.download('wmt15_eval')

主要区别：

长期

mteval-13a.pl 和nltk.translate.corpus_bleu 之间有几个区别：

第一个区别是 mteval-13a.pl 带有自己的 NIST 标记器，而 BLEU 的 NLTK 版本是度量标准的实现，假设输入是预先标记的。

顺便说一句，这个 ongoing PR 将弥合 NLTK 和 NIST 标记器之间的差距

另一个主要区别是mteval-13a.pl 期望输入为.sgm 格式，而 NLTK BLEU 接受 python 字符串列表，请参阅README.txt in the zipball here for more information of how to convert textfile to SGM。

mteval-13a.pl 期望 ngram 顺序至少为 1-4。如果句子/语料库的最小 ngram 顺序小于 4，它将返回 0 概率，即 math.log(float('-inf'))。为了模拟这种行为，NLTK 有一个 _emulate_multibleu 标志：

见https://github.com/nltk/nltk/blob/develop/nltk/translate/bleu_score.py#L477

mteval-13a.pl 能够生成 NIST 分数，而 NLTK 没有 NIST 分数实施（至少目前还没有）

NLTK 中的 NIST 分数为 upcoming in this PR

除了差异之外，NLTK BLEU 分数还包含更多功能：

处理原始 BLEU (Papineni, ‎2002) 忽略的边缘案例

见https://github.com/nltk/nltk/pull/1383

同样为了处理 Ngram 的最大阶数

见https://github.com/nltk/nltk/blob/develop/nltk/translate/bleu_score.py#L175

而NIST has a smoothing method 用于几何序列平滑，NLTK has an equivalent object with the same smoothing method 以及更多平滑方法来处理来自Chen and Collin, 2014 的句子级BLEU

最后，为了验证 NLTK 版本的 BLEU 中添加的功能，添加了回归测试来说明它们，请参阅https://github.com/nltk/nltk/blob/develop/nltk/test/unit/translate/test_bleu.py

【讨论】：