D-X-Y/landmark-detection

why dont normalized the network outputs to (0-1)?

Closed this issue · 6 comments

I recently study your codes, and feel very incomprehensible about the net outputs.
Just like face detection, the net out(box coordination) will be normalized.
why in the regression model or SBR model, the outputs is not normalization activation, because the values of real landmarks or heatmaps in (0-1).

D-X-Y commented

Could you please indicate which line of code that you are referring to?

My description may not be precise enough.
for example, in SBR/lib/models/cpm_vgg16.py, you use the batch_cpms to calculate loss. why dont normalized batch_cpms in values (0-1). because the real heatmaps label‘s numerical range is (0-1).
The regression model is the same. model's output is the final predict key position. why not normalized it, because model's output may be out of (0-1). It may make training network model harder.

in SRT/lib/models/ProCPM.py line137: I find you use sigmoid(cpm) to normalized heatmaps.
I'm very confused about whether to use it or not. I read other face landmarks git code, I found that regression models didn't use it.
Isn't it important or not needed?

D-X-Y commented

For sigmoid, it is a hyperparameter (https://github.com/D-X-Y/landmark-detection/blob/master/SRT/lib/models/ProCPM.py#L137), and we did not use it in our experiments.

We do not normalize its value following (https://arxiv.org/pdf/1602.00134.pdf), and L2 loss with unnormalized prediction works well.

Thank you very much for your answering, can you explain why dont use 'sigmoid' prediction? Is it because it has no effect on the prediction results?

D-X-Y commented

My intuition is this is a regression problem that does not need to use sigmoid.