google-research/electra

some confusions about paper

leileilin opened this issue · 1 comments

hey man.
I've read your team's papers, and I don't quite understand some of them.
In the paper, you mentioned that you don’t back-propagate the discriminator loss through the generator (indeed, you can’t because of the sampling step), but you said you could pretrain the generator and discriminator at the same time, I don't know how it works.
Thank you.

Same question here. The generator and discriminator are trained together with the total loss in the training codes.
Does anyone have any other ideas?