Issue training for male dataset
zshakeri opened this issue · 1 comments
zshakeri commented
I have been trying to train the model for a male dataset. I've tried training from scratch and finetuning the provided checkpoint. I tried with the default parameters (batchsize 3 - 8GPUs) and increasing batch size to 32 on 8 GPUs and playing around with the lr. In all cases, the error saturates to -5 around 5k-20k steps and then either increases or blows up. Do you have any suggestions what to do in this case? Have you trained the model for any dataset other than LJ?
Examples of training loss curves:
rafaelvalle commented
Try weightdecay and clipping the norm of the gradients/
…On Thu, Jan 7, 2021 at 11:48 AM Zahra S ***@***.***> wrote:
I have been trying to train the model for a male dataset. I've tried
training from scratch and finetuning the provided checkpoint. I tried with
the default parameters (batchsize 3 - 8GPUs) and increasing batch size to
32 on 8 GPUs and playing around with the lr. In all cases, the error
saturates to -5 around 5k-20k steps and then either increases or blows up.
Do you have any suggestions what to do in this case? Have you trained the
model for any dataset other than LJ?
Examples of training loss curves:
[image: Screen Shot 2020-12-16 at 10 38 57 AM]
<https://user-images.githubusercontent.com/58200907/103936684-95bab300-50dc-11eb-98ce-8eece0745a58.png>
[image: Screen Shot 2021-01-07 at 11 47 22 AM]
<https://user-images.githubusercontent.com/58200907/103937794-204fe200-50de-11eb-81fe-5dba19ed0972.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#245>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARSFD7TY774G6RGBNWM7ATSYYFXHANCNFSM4VZN6PFA>
.