Question on the comparison between GPT and GPT2

Question

Question on the comparison between GPT and GPT2

mrzjy opened this issue 4 years ago · 4 comments

Hi, Thanks for sharing the models! There's a detail that I'm curious about. Is there a reason why CDialGPT2LCCC performs worse than CDialGPTLCCC? I get that GPT2 uses pre-LayerNorm and also adds an additional layerNorm after final attention block compared with GPT, but I do not expect such difference results in much worse performance in CDialGPT2LCCC.

Besides, in the paper, you mentioned that both CDialGPTLCCC and CDialGPT2LCCC are firstly pretrained on your Chinese novel dataset. Does this imply that there's also a GPT2_novel model that you did not release (based on which CDialGPT2LCCC is post-trained)?

Answer 1 · 2020-11-12T03:56:08.000Z

Since we do not have GPT2_novel trained on Chinese novels corpus, The CDialGPT2 is initialized from the GPT_novel.
The parameters in GPT2 that do not exist in GPT are initialized from scratch.

Answer 2 · 2020-11-23T02:08:02.000Z

Since we do not have GPT2_novel trained on Chinese novels corpus, The CDialGPT2 is initialized from the GPT_novel.
The parameters in GPT2 that do not exist in GPT are initialized from scratch.

Thanks for the reply, but how does it make sense to initialize GPT2 with GPT checkpoint ? Could this be part of the reason why your CDialGPT2LCCC perform worse than CDialGPTLCCC ?

Answer 3 · 2020-11-23T08:53:02.000Z

Since we do not have GPT2_novel trained on Chinese novels corpus, The CDialGPT2 is initialized from the GPT_novel.
The parameters in GPT2 that do not exist in GPT are initialized from scratch.

Thanks for the reply, but how does it make sense to initialize GPT2 with GPT checkpoint ? Could this be part of the reason why your CDialGPT2LCCC perform worse than CDialGPTLCCC ?

Q1: "but how does it make sense to initialize GPT2 with GPT checkpoint ?"
A: It is just a try. We just try to figure out would it degrade the performance.
Q2: " Could this be part of the reason why your CDialGPT2LCCC perform worse than CDialGPTLCCC"
A: Yes.

Answer 4 · 2020-11-23T10:00:20.000Z

Okay~
Again, thanks for your work~