LiyuanLucasLiu/Transformer-Clinic

Admin for 100L-100L model?

Vincent131499 opened this issue · 1 comments

It is mentioned in the article that 8 pieces of A100 are used to train the model. How long has it been trained and how many epochs have been reached? What is the specific performance/bleu of the final model?

Thanks for asking : -)

I trained the model for 40 epochs and got a BLEU score of 29.5 (on WMT'14 En-De). I didn't finish the training due to the high cost, so I don't know whether the performance could be better if trained longer (I feel probable not unless you train it for a really really long time).

More details would be released shortly (featuring a new plug-in-and-play Admin implementation), stay tuned!