The training loss is not decreasing

Question

The training loss is not decreasing

100daggerz opened this issue 8 months ago · 1 comments

Hi,
Thank you for your code.

I am training a ldm model with the config file I have attached.
I have training with multiple dataset and settings. Always the training loss doesnt converge after certain epochs. usually it is when the loss is somewhere around 0.1. The loss does goes down consistently but very slowly.
As I am using MSE loss 0.1 is large for image generation.

Once I continued training until 400 epochs at that time the model was overfitted but the loss was minimum around 0.02.
May be could you share your insights or does anyone has faced this issue?
tuned_class_cond_bdd_1.zip

Answer 1 · 2024-03-28T07:01:03.000Z

Hello @100daggerz , I see that in your diffusion parameters, you have modified the timesteps, beta start and beta end.
I would suggest to use the parameters mentioned in the repo and not changing them(for class conditioning you can use the mnist_class_cond.config). One reason for the issue that you are facing could be that at each timestep now you are adding larger amounts of noise, unlike the case of 1000 timesteps.
I am assuming you have done this so the model could be trained faster, but could you try with 1000 timesteps. While the model will take longer to train, I am guessing you would get better results than current case of 200 timesteps.