从零开始在python中获得Bleu分数问题的回答

从零开始在python中获得Bleu分数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

在观看Andrew Ng关于<a href="https://www.youtube.com/watch?v=DejHQYAGb7Q" rel="nofollow noreferrer">Bleu score</a>的视频后，我想在Python中从头开始实现一个。我用python和numpy编写了完整的代码。这是完整的代码 <pre><code>import numpy as np def n_gram_generator(sentence,n= 2,n_gram= False): ''' N-Gram generator with parameters sentence n is for number of n_grams The n_gram parameter removes repeating n_grams ''' sentence = sentence.lower() # converting to lower case sent_arr = np.array(sentence.split()) # split to string arrays length = len(sent_arr) word_list = [] for i in range(length+1): if i < n: continue word_range = list(range(i-n,i)) s_list = sent_arr[word_range] string = ' '.join(s_list) # converting list to strings word_list.append(string) # append to word_list if n_gram: word_list = list(set(word_list)) return word_list def bleu_score(original,machine_translated): ''' Bleu score function given a orginal and a machine translated sentences ''' mt_length = len(machine_translated.split()) o_length = len(original.split()) # Brevity Penalty if mt_length>o_length: BP=1 else: penality=1-(mt_length/o_length) BP=np.exp(penality) # calculating precision precision_score = [] for i in range(mt_length): original_n_gram = n_gram_generator(original,i) machine_n_gram = n_gram_generator(machine_translated,i) n_gram_list = list(set(machine_n_gram)) # removes repeating strings # counting number of occurence machine_score = 0 original_score = 0 for j in n_gram_list: machine_count = machine_n_gram.count(j) original_count = original_n_gram.count(j) machine_score = machine_score+machine_count original_score = original_score+original_count precision = original_score/machine_score precision_score.append(precision) precisions_sum = np.array(precision_score).sum() avg_precisions_sum=precisions_sum/mt_length bleu=BP*np.exp(avg_precisions_sum) return bleu if __name__ == "__main__": original = "this is a test" bs=bleu_score(original,original) print("Bleu Score Original",bs) </code></pre> 我试着用nltk测试我的分数 ^{pr2}$ 问题是我的bleu分数是<code>2.718281</code>，nltk是<code>1</code>。我做错什么了？在 以下是一些可能的原因： 1）我根据机器翻译的句子长度计算了ngrams。从1点到4点 2）<code>n_gram_generator</code>我自己写的函数，不确定它的准确性 3）一些我如何使用错误的函数或计算错误的bleu分数 有人能查一下我的密码，告诉我哪里出错了吗？在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

从零开始在python中获得Bleu分数

1 个回答

相关Python问题