Question about the lambda
Bikesuffer opened this issue · 2 comments
Hi there,
It's me again, I am curious about whether you guys tried different combination of lambda for feat_loss and out_loss or maybe add a lambda for the task_loss?
From my training process, it seems that the feat_loss contributes most part of the total loss.
Hi, thanks for your inquiry.
For our text-to-image experiments, we simply set the loss weights λ_Task, λ_OutKD, and λ_FeatKD to 1, which was effective in empirical validation without hyperparameter tuning and was used in the experiments of our paper.
In recent trials with BK-SDM-Small and batch size 64, changing λ_FeatKD to {0.25, 0.5, 1, 2, 4} did not affect the final generation scores. However, using different scales like 0.01, 0.1, 10, or 100 hasn't been explored.
It would be interesting to study the effect of different loss weightings.
added: some experimental results were as follows:
Thanks for the information.