obss/jury

Computing BLEU more than once

Closed this issue ยท 4 comments

Axe-- commented

Hey, why does computing the BLEU score more than once, affect the key value of the score dict.
e.g. 'bleu_1', 'bleu_1_1', 'bleu_1_1_1'

Overall I find the library quite user-friendly, but unsure about this behavior.

Hey @Axe-- , thank you for sharing thoughts about the library, appreciate the feedback ๐Ÿ™‚. Can you please share a reproducable example/code snippet, so I can look into the problem in detail ? Also, can you please specify your version of jury and datasets.

Axe-- commented

Sure! Here's the snippet

from jury import Jury
scorer = Jury()
predictions = [
    ["the cat is on the mat", "There is cat playing on the mat"],
    ["Look!    a wonderful day."]
]
references = [
    ["the cat is playing on the mat.", "The cat plays on the mat."],
    ["Today is a wonderful day", "The weather outside is wonderful."]
]

scores = scorer(predictions=predictions, references=references)
scores_ = scorer(predictions=predictions, references=references)

print(scores.keys())
print(scores_.keys())

Output:

dict_keys(['empty_predictions', 'total_items', 'bleu_1', 'bleu_2', 'bleu_3', 'bleu_4', 'meteor', 'rouge'])
dict_keys(['empty_predictions', 'total_items', 'bleu_1_1', 'bleu_2_2', 'bleu_3_3', 'bleu_4_4', 'meteor', 'rouge'])

Version Info:

import jury, datasets
print(jury.__version__, datasets.__version__)
>>> 2.0.0 1.12.1

Hope this helps!

@Axe-- Thank you for the snippet, I reproduced the behavior, and I think it is due to previous bleu implementation lack a spesific naming convention control. However, this problem does not occur in the recent version of jury (2.1.0). As the same code produces

>>> dict_keys(['empty_predictions', 'total_items', 'bleu_1', 'bleu_2', 'bleu_3', 'bleu_4', 'meteor', 'rouge'])
>>> dict_keys(['empty_predictions', 'total_items', 'bleu_1', 'bleu_2', 'bleu_3', 'bleu_4', 'meteor', 'rouge'])

Upgrading to the latest version would solve this issue.

Axe-- commented

Awesome! And thank you for all the work!