Is the update formula different from that in TPR paper?
JonathanChen1002 opened this issue · 6 comments
from source code for calculating gradient
https://github.com/cnclabs/codes.tpr.rec/blob/master/src/optimizer/triplet_optimizer.cpp#L28-L49
It seems that it use the formula below:
But from formula (9) in the TPR paper, the update formula is as following:
which is equal to:
I'm wondering if the implementation in this repo is different from original TPR paper?
The current version is indeed different from original TPR paper as we finally modify some parts before the code release. You can try to replace the current one by the conventional bpr loss. I believe it's still able to reproduce the results we reported in the paper.
Actually I'm applying TPR on my own dataset. And I wish to monitor the curve of loss.
I'm curious about what loss function that was adopted in this repo? Is there any source of reference?
Thanks for your reply.
Since BPR gets only positive pairs, the current loss would be: log sigmoid ( prediction - margin ).
edit. this function is not exactly correct, see the comments in below
Since BPR gets only positive pairs, the current loss would be: log sigmoid ( prediction - margin ).
Is the meaning of "prediction" the same as that in the source code?
https://github.com/cnclabs/codes.tpr.rec/blob/master/src/optimizer/triplet_optimizer.cpp#L28-L49
It seems that the input for sigmoid function is -prediction (or -x+margin) rather than prediction - margin
Besides, I think the gradient should be 1 - sigmoid(prediction)
Where did I get it wrong?
let me to explain it from this view:
the original loss is log sigmoid( prediction ), I add a margin in its gradient ( i.e. sigmoid(-prediction) ), for the purpose of learning it effective and efficient
I've implemented a monitor to calculate batch loss (sigmoid(prediction)) and it looks fine.
Thanks a lot!