DAMO-NLP-SG/TempReason

Question Regarding L3 Training Results

Opened this issue · 1 comments

Hello! I'm very interested in your work.

I've run the T5-SFT training on the L2 and L3 datasets for ReasonQA and OBQA. However, I've noticed a significant discrepancy between the reproduced L3 results in ReasonQA and the results presented in the paper.
My results show an EM of 28.56 and an F1 score of 42.61, while the paper reports an EM of 78.2 and an F1 score of 83.0.
In my config file, I've set the text to "fact_context," and both the training and test datasets are configured for L3.
Could you please advise me on what I need to modify to align my results with the ones presented in the paper?

@vanity1216 I also encountered the same problem when trying to reproduce the results. I got similar (slightly better but negligible) results to what you got. Any thoughts?