
A quick question about BLUERT and Tools

First of all, congratulations on your work being accepted by TACL!
I have some questions:

  1. Implementation of the BLEURT metric

I directly downloaded the evaluation model from the official BLEURT repository and installed the corresponding packages. Using the example code, I evaluated a translation, as follows:

`from bleurt import score
references_list = read('wmt22.en-zh.zh')
candidates_list = read('wmt22.en-zh.zh.maps.0-seed.trans')
checkpoint = "bleurt/bleurt/test_checkpoint"
scorer = score.BleurtScorer(checkpoint)
scores = scorer.score(references=references_list, candidates=candidates_list)

average_score = sum(scores) / len(scores)
print("Average BLEURT score:", average_score)`

However, the score was only 0.57. Is this evaluation process consistent with the one in your paper? Could there be something I've overlooked that resulted in this poor score?

  1. Graphical Tools
    Additionally, I am very curious about which tools you used to create the charts in your paper.

Thank you!

  1. Here is a minimal script for BLEURT evaluation:
from bleurt import score as bleurt_score

def readlines(file_path):
    if not file_path:
        return []
    with open(file_path, 'r') as f:
        lines = f.readlines()
    return [l.strip() for l in lines]

references_list = readlines('data/raw/wmt22.en-zh.zh')
candidates_list = readlines('output/text-davinci-003/wmt22.en-zh.zh.maps.0-seed.trans')
checkpoint = "eval_ckpt/BLEURT-20"

bleurt_model = bleurt_score.LengthBatchingBleurtScorer(checkpoint)
scores = bleurt_model.score(references=references_list, candidates=candidates_list, batch_size=2)
average_score = sum(scores) / len(scores)
print("Average BLEURT score:", average_score)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1019/1019 [08:42<00:00,  1.95it/s]
Average BLEURT score: 0.7258136263286359
  1. I only use Keynote.

The problem has been solved, thank you very much for your answer!