when use package 'evaluate‘ with 'sacrebleu' calculate metric happend error
TristanShao opened this issue · 1 comments
original code url:
when i do run a simple case(add some model file) by 'python3 local_evaluation.py'
and when some sample need get bleu metric, encountered such error
File "xxx/amazon-kdd-cup-2024-starter-kit/local_evaluation.py", line 256, in <module> main() File "xxx/amazon-kdd-cup-2024-starter-kit/local_evaluation.py", line 241, in main per_task_metrics = evaluate_outputs(data_df, outputs) File "xxx/amazon-kdd-cup-2024-starter-kit/local_evaluation.py", line 99, in evaluate_outputs metric_score = eval_fn(model_output, ground_truth) File "xxx/amazon-kdd-cup-2024-starter-kit/local_evaluation.py", line 183, in <lambda> "jp-bleu": lambda generated_text, reference_text: metrics.calculate_bleu_score( File "xxx/amazon-kdd-cup-2024-starter-kit/metrics.py", line 254, in calculate_bleu_score sacrebleu = evaluate.load("sacrebleu") File "/home/xxx/.local/lib/python3.10/site-packages/evaluate/loading.py", line 751, in load evaluation_cls = import_main_class(evaluation_module.module_path) File "/home/xxx/.local/lib/python3.10/site-packages/evaluate/loading.py", line 76, in import_main_class module = importlib.import_module(module_path) File "/opt/miniconda3/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 879, in exec_module File "<frozen importlib._bootstrap_external>", line 1017, in get_code File "<frozen importlib._bootstrap_external>", line 947, in source_to_code File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/home/xxx/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--sacrebleu/009c8b5313309ea5b135d526433d5ee76508ba1554cbe88310a30f85bb57ec88/sacrebleu.py", line 16 } SyntaxError: closing parenthesis '}' does not match opening parenthesis '(' on line 14
It is looklike sacrebleu.py have some problem, maybe my evaluate(0.4.2) or sacrebleu(2.14) version confilict? I dont know
some code in url
def calculate_bleu_score(generated_text: str, reference_text: str, is_japanese: bool = False) -> float:
"""
Calculates the BLEU score for a generated text compared to a reference truth text. This function supports
both general text and Japanese-specific evaluation by using the sacrebleu library.
Parameters:
- generated_text (str): The generated text to be evaluated.
- reference_text (str): The reference truth text.
- is_japanese (bool, optional): Flag to indicate whether the text is in Japanese, requiring special tokenization.
Returns:
- float: The BLEU score as a percentage (0 to 1 scale) for the generated text against the reference truth.
"""
global sacrebleu
if sacrebleu is None:
sacrebleu = evaluate.load("sacrebleu")
# Preprocess input texts
generated_text = generated_text.lstrip("\n").rstrip("\n").split("\n")[0]
candidate = [generated_text]
reference = [[reference_text]]
# Compute BLEU score with or without Japanese-specific tokenization
bleu_args = {"predictions": candidate, "references": reference, "lowercase": True}
if is_japanese:
bleu_args["tokenize"] = "ja-mecab"
score = sacrebleu.compute(**bleu_args)["score"] / 100
return score
It seems you are using amazon-kdd-cup-2024-starter-kit, which users HuggingFace evaluate, which uses sacrebleu
.
If you want to report a bug in this sacrebleu
repository, you should show a replicable minimal test case, using sacrebleu API directly (i.e. not via amazon-kdd-cup-2024-starter-kit and evaluate). Otherwise, you should report the bug in one of the above-mentioned frameworks.
That said, I was not able to replicate this bug. Everything seems to work:
!pip install sacrebleu evaluate
import sacrebleu, evaluate
print(sacrebleu.__version__) # 2.4.2
print(evaluate.__version__) # 0.4.2
evaluate_sacrebleu = evaluate.load("sacrebleu")
result = evaluate_sacrebleu.compute(predictions=["John loves Mary."], references=[["John loves HugginFace."]])
print(result["score"]) # 35.35533905932737
I am closing this issue, but feel free to reopen if you identify any bugs in sacrebleu.