zwhe99/MAPS-mt

A quick question about BLUERT and Tools

Closed this issue · 2 comments

First of all, congratulations on your work being accepted by TACL!
I have some questions:

  1. Implementation of the BLEURT metric

I directly downloaded the evaluation model from the official BLEURT repository and installed the corresponding packages. Using the example code, I evaluated a translation, as follows:

`from bleurt import score
references_list = read('wmt22.en-zh.zh')
candidates_list = read('wmt22.en-zh.zh.maps.0-seed.trans')
checkpoint = "bleurt/bleurt/test_checkpoint"
scorer = score.BleurtScorer(checkpoint)
scores = scorer.score(references=references_list, candidates=candidates_list)

average_score = sum(scores) / len(scores)
print("Average BLEURT score:", average_score)`

However, the score was only 0.57. Is this evaluation process consistent with the one in your paper? Could there be something I've overlooked that resulted in this poor score?

  1. Graphical Tools
    Additionally, I am very curious about which tools you used to create the charts in your paper.

Thank you!

  1. Here is a minimal script for BLEURT evaluation:
from bleurt import score as bleurt_score

def readlines(file_path):
    if not file_path:
        return []
    with open(file_path, 'r') as f:
        lines = f.readlines()
    return [l.strip() for l in lines]

references_list = readlines('data/raw/wmt22.en-zh.zh')
candidates_list = readlines('output/text-davinci-003/wmt22.en-zh.zh.maps.0-seed.trans')
checkpoint = "eval_ckpt/BLEURT-20"

bleurt_model = bleurt_score.LengthBatchingBleurtScorer(checkpoint)
scores = bleurt_model.score(references=references_list, candidates=candidates_list, batch_size=2)
average_score = sum(scores) / len(scores)
print("Average BLEURT score:", average_score)
2024-01-23 11:45:06.430392: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-23 11:45:06.471530: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-23 11:45:06.471604: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-23 11:45:06.471645: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-23 11:45:06.479669: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-01-23 11:45:06.479899: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-01-23 11:45:07.332096: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-23 11:45:09.753923: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1019/1019 [08:42<00:00,  1.95it/s]
Average BLEURT score: 0.7258136263286359
  1. I only use Keynote.

The problem has been solved, thank you very much for your answer!