语料库级bleu与句子级bleu s问题的回答

语料库级bleu与句子级bleu s

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

TL；DR： <pre><code>>>> import nltk >>> hypothesis = ['This', 'is', 'cat'] >>> reference = ['This', 'is', 'a', 'cat'] >>> references = [reference] # list of references for 1 sentence. >>> list_of_references = [references] # list of references for all sentences in corpus. >>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references. >>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses) 0.6025286104785453 >>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis) 0.6025286104785453 </code></pre> （注意：为了获得BLEU score实现的稳定版本，必须在<code>develop</code>分支上提取最新版本的NLTK） <hr/> 在Long中： 实际上，如果整个语料库中只有一个引用和一个假设，那么<code>corpus_bleu()</code>和<code>sentence_bleu()</code>应该返回与上面示例中相同的值。 在代码中，我们看到<a href="https://github.com/nltk/nltk/blob/develop/nltk/translate/bleu_score.py#L26" rel="noreferrer">^{<cd4>} is actually a duck-type of ^{<cd5>}</a>： <pre><code>def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): return corpus_bleu([references], [hypothesis], weights, smoothing_function) </code></pre> 如果我们看看<code>sentence_bleu</code>的参数： <pre><code> def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """" :param references: reference sentences :type references: list(list(str)) :param hypothesis: a hypothesis sentence :type hypothesis: list(str) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The sentence-level BLEU score. :rtype: float """ </code></pre> <code>sentence_bleu</code>引用的输入是<code>list(list(str))</code>。 因此，如果你有一个句子字符串，例如<code>"This is a cat"</code>，你必须对它进行标记化才能得到一个字符串列表<code>["This", "is", "a", "cat"]</code>，并且由于它允许多个引用，因此它必须是一个字符串列表，例如，如果你有第二个引用，“这是一只猫”，你对<code>sentence_bleu()</code>的输入是： <pre><code>references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ] hypothesis = ["This", "is", "cat"] sentence_bleu(references, hypothesis) </code></pre> 当谈到<code>corpus_bleu()</code>list_of_references参数时，它基本上是<a href="https://github.com/nltk/nltk/blob/develop/nltk/translate/bleu_score.py#L82" rel="noreferrer">a list of whatever the ^{<cd3>} takes as references</a>： <pre><code>def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """ :param references: a corpus of lists of reference sentences, w.r.t. hypotheses :type references: list(list(list(str))) :param hypotheses: a list of hypothesis sentences :type hypotheses: list(list(str)) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The corpus-level BLEU score. :rtype: float """ </code></pre> 除了查看<a href="https://github.com/nltk/nltk/blob/develop/nltk/translate/bleu_score.py" rel="noreferrer">^{<cd14>}</a>中的doctest之外，还可以查看<a href="https://github.com/nltk/nltk/blob/develop/nltk/test/unit/translate/test_bleu.py" rel="noreferrer">^{<cd15>}</a>中的unittest，了解如何使用<code>bleu_score.py</code>中的每个组件。 顺便说一句，因为<code>sentence_bleu</code>作为<code>bleu</code>导入到（<code>nltk.translate.__init__.py</code>]（<a href="https://github.com/nltk/nltk/blob/develop/nltk/translate/__init__.py#L21" rel="noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/translate/init.py#L21</a>）中，使用 <pre><code>from nltk.translate import bleu </code></pre> 与以下相同： <pre><code>from nltk.translate.bleu_score import sentence_bleu </code></pre> 在代码中： <pre><code>>>> from nltk.translate import bleu >>> from nltk.translate.bleu_score import sentence_bleu >>> from nltk.translate.bleu_score import corpus_bleu >>> bleu == sentence_bleu True >>> bleu == corpus_bleu False </code></pre>

语料库级bleu与句子级bleu s

1 个回答

相关Python问题