chenxran/Orion

Unable to reproduce Main Result in the paper

Closed this issue · 1 comments

Hello, I configured the environment according to your README.md. To evaluate Orion's performance on OpenRule155, run python evaluation.py --task openrule155 --inductor rule --mlm_training True --bart_training True --group_beam True.
I think that the results for the metric BLEU-1 BLEU-2 BLEU-4 ROUGE-L METEOR self-BLEU-2 will be the same as the results in the last row of Table 1 in your paper. However, the results are all slightly different from those in your paper, particularly METEOR and self-BLEU-2.

  1. Results of the code:
  • METEOR: 0.2903811252268602
  • self-BLEU-2: 0.87502350778747
  1. Results in the paper:
  • ROUGE-L: 0.4041
  • self-BLEU-2: 0.9094

There are many possible reasons causing this result. If possible we can communicate via WeChat: KERO652.