loss score become negative infinity and nan
Closed this issue · 5 comments
I'm using this model on a vehicle dataset and trying to create image embedding. During the training, at some point, loss suddenly become negative infinity and eventually become nan. Have you guys encounter this issue yet? What do you think possibly make this happen?
Negative losses are not intrinsically bad, but if you are getting them, it implies that there is more than 100% mutual information computed by the loss network. This is unlikely, and could be caused by sampling error. Check if the batch is randomized I guess, make sure you don't have duplicate images, as this could cause very small losses that could end up negative.
The NaN, Generally it's caused when you get a div/0, which results in a NAN in a calculation, which then causes the gradient to be NAN, which causes a weight to be NAN, which causes a chain reaction that eventually blows up the network.
Things that can help.
- Normalize your all your input values to be between between 0 and 1.
- If that doesn't work, write a statement that checks the parameters for NAN.
- Look around in the code, anywhere there is a log, there should be a +eps since log(0) is NaN.
Yeah., in line 44 of train .py
term_a = torch.log(self.prior_d(prior)).mean()
might be better to be
eps = torch.finfo.eps
term_a = torch.log(self.prior_d(prior)+eps).mean()
The NaN issue was resolved, but I'm still getting huge negative loss scores during the training. I'm running this model on a small subset of 3000 vehicle images. I want to test the performance of it before I throw it on the big vehicle set. is it possible that the size of my dataset is too small, so it doesn't require too many epochs to train.
Hi, yeah..
Looking at the math...
Ej = -F.softplus(-self.global_d(y, M)).mean()
Em = F.softplus(self.global_d(y, M_prime)).mean()
GLOBAL = (Em - Ej) * self.alpha
To get negative loss, it means EM < EJ
EM is always positive, since softplus is always positive
This means that this could be re-written as..
Ej = F.softplus(-self.global_d(y, M)).mean()
Em = F.softplus(self.global_d(y, M_prime)).mean()
GLOBAL = (Em + Ej) * self.alpha
So therefore both GLOBAL and LOCAL should always be positive...
term_a = torch.log(self.prior_d(prior)).mean()
term_b = torch.log(1.0 - self.prior_d(y)).mean()
PRIOR = - (term_a + term_b) * self.gamma
prior_d nonlinearity is sigmoid, so that means 0 < prior_d < 1.0
therefore log of prior_d must be negative
Then I think term_a and term_b should be negative
So -(term_a + term_b) must be positive, unless somehow we are getting prior_d > 1.0
hmmm...
Can you check the values you are getting against this reasoning?
Print out the values of PRIOR/GLOBAL/LOCAL during your run. Figure out which one is going large negative, and we should be able to figure it out.