Confused about the NLL loss you proposed
taosean opened this issue · 6 comments
Hi, I'm confused about the loss function you proposed at equation (5) in your paper.
In my understanding, the goal of the NLL loss is trying to make the predicted mean value and GT value as close as possible, e.g and
and make the variance, e.g. close to 0 (which means high certainty).
However, at your equation, if the predicted mean value equals GT value and variance close to 0, the
NLL loss could be negative.
How do you think about this?
Thanks!
@jwchoi384
Hi,thank you for your answer.
I understand what the equation (5) stands for and I know what Maximum Likelihood Estimation is.
I understand that you are trying to estimate the parameters (mu and sigma) of gaussian distribution by MLE.
What I'm curious is that what will your NLL loss become during training? Does it approach negative infinity since sigma approaches 0?
Besides, did you add the localization loss with classification loss to form the overall loss function and optimize the overall loss function? Or you just optimize them separately?
Looking forward to your response.
Thanks!
@taosean
Hi, I optimized them (localization loss, classification loss, and objectness loss) separately like conventional YOLOv3 algorithm. But, it can affect each other in training process.
For training, we need to calculate the gradient for mean and variance of NLL loss function.
In "yolo_layer.c" and "gaussian_yolo_layer.c", you can see the delta[index + xxxx]
.
delta[index + xxxx]
means the gradient.
In the case of TF or pytorch, the back propagation is easily implemented, but in C implementation, we calculate the gradient and do coding.
Anyway, I didn't check the total NLL loss value during training because I only need the gradient value of NLL loss on training.
I checked the gradient, mean, and variance value. It works well.
In training, please check the mAP rather than loss value. Loss value is just the value related to delta.
Hi, @jwchoi384. I have a few questions about delta_gaussian_yolo_box
in "gaussian_yolo_layer.c".
In my opinion, variance refers to sigma^2. But in your code, you calculate the gradient for sigma instead of sigma^2, am I right?
What do the four variables temp_x, temp_y, temp_w, temp_h
mean? With the (1./2.) in them, the gradients seem not to be consistent with the results in http://jrmeyer.github.io/machinelearning/2017/08/18/mle.html.
@xuannianz
Hi, yes I calculate the gradient for sigma.
temp_x, temp_y, temp_w, temp_h
are variables for calculating gradients.
The proposed loss function is slightly different from the expression in the link.
I used sigma_constant
and epsi
for numerical stability and training.
If you calculate the gradient of below function (proposed loss function except summation), you can understand why i use temp_x,y,w,h
variables.
scale/2
in "gaussian_yolo_layer.c" is hyper parameter.
Got it, many thanks.
I have also implemented the loss function in tf in my keras implementation (https://github.com/xuannianz/keras-GaussianYOLOv3/blob/b3c8fc7b5de67019b8c89302b141645c7cee1b8a/loss.py#L8-L13). The only difference is that I use 2 * (sigma + sigma_const)^2
as denominator in exp() function. Will try your expression later.