Different optimization criteria for matrix factorization in the code and in the paper

Question

Different optimization criteria for matrix factorization in the code and in the paper

VConchello opened this issue 2 years ago · 4 comments

In the paper, it’s said that MetaOD minimizes the sum of sDCG as the optimization criteria to factorise a matrix into latent factors (Section 3.4.1), but the code (core.py:91,156,166) is using the function ndcg_score, from sklearn.metrics, which is different than sDCG in some aspects.
And then for the gradient descent it uses the gradient of sDCG to find an optimum.
Is there any rationale for these changes?

Answer 1 · 2022-07-07T14:53:32.000Z

The reason is more for numerical stability. You can replace that by dcg and the results should be almost the same (some numerical stability may mess up though).

So IDCG is more like a scaling factor but will not change the results of the ranking to my understanding.

Answer 2 · 2022-07-07T15:07:11.000Z

Thank you for your answer. The scaling part is clear, I think, but looking at the formula:

First, this formula is using the sigmoid. And also the numerator is pretty different.
I'm not sure if these changes have an important impact on the evaluation of this metric.

Answer 3 · 2022-07-07T15:20:38.000Z

That is because DCG is not differentiable...specifically log2 (i+1) -> i is an indicator function and has no derivatives. We use sigmoid to approximate it.

Answer 4 · 2022-07-08T11:29:59.000Z

Okay, I understood that the approximation would be used for both the gradient and the function to be minimised, not just to find the gradient.
Thank you for the answer.