cylnlp/dialogsum

Best result with BART-Large model

BinWang28 opened this issue · 4 comments

Many thanks for the recent release with the BART-Large model training.

I tried to reproduce the baseline model reported in the paper. Up to now, I can get roughly the same result as your released baseline. ROUGE-1 score around 45.5 ~ 46.0

According to the paper, BART-Large can achieve 47.28 R1 scores. I could not achieve it with the BART-large.
Any hints on that? (I use the py-rouge score same as the reported)

Hi @BinWang28, as I have learned from @chenllliang, the baseline model is just for a quick start - without any hyper-parameter search.
To reproduce similar results reported in our paper, you may try the hyper-parameter reported in our paper (fairseq version), or the following hyper-parameters (huggingface version).

Screenshot from 2022-01-19 17-55-23

Thanks a lot for sharing the detailed hyperparameters. Looks like I am using a much smaller batch size.
I will try it with other settings.

You are most welcome!
Please let me know if any further questions.

Thanks, I am able to get close results now.