Generative model comparison on Full-Transformer and GPT2 both fine-tuned by ED
YuanEric88 opened this issue · 2 comments
Hi,
I have a problem for the generative model. In the paper, you used the full-transformer model for pre-train and fine-tune. I am wondering another generative model - GPT2. Since you haven't released the fine-tuned full-transformer generative model and I don't have enough resources to replicate your outcome for comparison, I would like to ask:
From your perspective, if I use ED to fine-tune GPT2 model, what will be the performance for that(both automated metric and human ratings)? will there be a sacrifice compared with full-transformer model? Since there is no encoder parts in GPT2, but there are multiple layer of decoders. Thanks
Hi! So I don't have tons of experience with GPT2, but I imagine that it'd get pretty good performance on ED since it's so much larger than the generative models in our paper, yeah. There are other recent models that also have a really high decoder/encoder ratio, so I imagine that that's not a blocker to using GPT2.
I have to ask this question, as I understand, ED used bert to embed and use embedding output as encoder input, bert encoder try to minimized negative loglikelihood of y* and y^ , in this case, y^ is the responses ground truth for each input y and x, y* is response predicted through bert encoder model? is that right?
and another phase is generative base I marked it like a bert decoder - because bert doen't have a tokenizer decoder , so we train a transformer like a decoder to get a sentence from bert encoder output?
Hope you answer these questions