Evaluation results of PANX task

Question

Evaluation results of PANX task

sakuraimai opened this issue 2 years ago · 3 comments

Hi, thank you for sharing this amazing dataset.
I have a question about the eval results of panx task.
I used en subset for training XLM-R, and all available language subsets for prediction and evaluation.
Although it looks like getting good results during training and prediction('test_{lang}_prediction.txt'), evaluation results on standard output shows 'f1 = 0.0' in all languages, even in English, which I used for training.
Is there any idea to solve this situation?

Evaluation results during training:

07/11/2022 16:56:11 - INFO - __main__ -   ***** Evaluation result best in en *****
07/11/2022 16:56:11 - INFO - __main__ -     f1 = 0.8004747277296844
07/11/2022 16:56:11 - INFO - __main__ -     loss = 0.27271114725369616
07/11/2022 16:56:11 - INFO - __main__ -     precision = 0.7906495655771618
07/11/2022 16:56:11 - INFO - __main__ -     recall = 0.8105471511381309

Evaluation results in en, fr

07/11/2022 17:00:21 - INFO - __main__ -   ***** Evaluation result  in en *****
07/11/2022 17:00:21 - INFO - __main__ -     f1 = 0.0
07/11/2022 17:00:21 - INFO - __main__ -     loss = 3.371383772871365
07/11/2022 17:00:21 - INFO - __main__ -     precision = 0.0
07/11/2022 17:00:21 - INFO - __main__ -     recall = 0.0
07/11/2022 17:02:17 - INFO - __main__ -   ***** Evaluation result  in fr *****
07/11/2022 17:02:17 - INFO - __main__ -     f1 = 0.0
07/11/2022 17:02:17 - INFO - __main__ -     loss = 3.6016474927957067
07/11/2022 17:02:17 - INFO - __main__ -     precision = 0.0
07/11/2022 17:02:17 - INFO - __main__ -     recall = 0.0

Answer 1 · 2022-12-17T10:10:12.000Z

same question

Answer 2 · 2023-02-17T12:10:32.000Z

Same issue. How about this?

Answer 3 · 2023-02-17T15:03:12.000Z

As described here, labels in the test data are automatically removed during preprocessing to prevent accidental cheating, which is why evaluation on the test data shows 0 scores for all languages.
We recommend to only evaluate on the validation data and upload test predictions using the submission form once you would like to submit your model results.