ChenRocks/UNITER

How to judge the convergence of the pre-training model?

Opened this issue · 0 comments

How to measure the loss weight of different pre-training tasks? Which task's loss determines the model training convergence?