Gradient of the DenseCRF loss
Closed this issue · 4 comments
Hi,
I'm very interested at your work and want to follow your paper in ECCV2018. I notice that in the paper the DenseCRF loss is
,
while in the code it is
In both the code and the paper, its loss is computed as
.
However, I think the gradient should be
.
Why is there a difference between the implementation and the theory? Should the first term of the gradient be ignored?
same doubt, do you figure it out ?
When summing over k for the first term, it becomes a constant no matter whether S is discrete or continuous. So I chose to ignore the first term.
@meng-tang Hi, I get confused. Since W is generated using Gaussian kernel, so every item in W is positive. S_k is the softmax output, so S_k is also positive. Then the gradient is always negative? So how can the gradient descent work?
The loss just keep increasing......