allegro/allRank

Whats the expected input to the losses and metrics?

lneukom opened this issue · 1 comments

E.g. for approxNDCGLoss, what is y_pred and y_true?

  • what is the range of the inputs?
  • does a slate have to be in order? e.g. best ranked first?

From what I gathered:

  • values should be in [0, ...)
  • higher values means better rank
  • the order does not matter

Is that correct? And are all losses and metrics following this API?
Thanks!

mhsyno commented

Hello,

both y_pred and y_true are of type torch.Tensor with the same shape: [batch_size, slate_length]

  • y_true values are labels for each slate in the original order. The higher value the more relevant item is according in the context of a slate. Typically relevance takes integer (e. g. from 0 to 4 in MSLR-WEB30K) or binary values.
  • y_pred values are real-valued scores from model which are used to produce new order of y_true according to these scores (descending sorting). They're also in the same order as y_true.

As for the question:

are all losses and metrics following this API?

In short: yes. For all the metrics and losses y_pred and y_true are mandatory arguments. Some of them have also additional arguments (most of them with default values specified) e. g. ats for metrics which specifies top n items of a slate taken into account while calculating a metric.
For some of the losses there are other mandatory arguements - when using ordinal loss you need to pass the number of ordinal values and for pointwise rmse number of unique ground truth values is required.
Take a notice that some arguments of the loss functions can drastically change form of the function e. g. weighing_scheme for [lambdaLoss].(https://github.com/allegro/allRank/blob/master/allrank/models/losses/lambdaLoss.py)

If in doubts I encourage to take a look at our real-life configs (which corresponds to experiments from our paper) from the reproducibility guide: https://github.com/allegro/allRank/tree/master/reproducibility/configs

Best,
Mikołaj