Whats the expected input to the losses and metrics?
lneukom opened this issue · 1 comments
E.g. for approxNDCGLoss, what is y_pred and y_true?
- what is the range of the inputs?
- does a slate have to be in order? e.g. best ranked first?
From what I gathered:
- values should be in [0, ...)
- higher values means better rank
- the order does not matter
Is that correct? And are all losses and metrics following this API?
Thanks!
Hello,
both y_pred and y_true are of type torch.Tensor
with the same shape: [batch_size, slate_length]
- y_true values are labels for each slate in the original order. The higher value the more relevant item is according in the context of a slate. Typically relevance takes integer (e. g. from 0 to 4 in MSLR-WEB30K) or binary values.
- y_pred values are real-valued scores from model which are used to produce new order of y_true according to these scores (descending sorting). They're also in the same order as y_true.
As for the question:
are all losses and metrics following this API?
In short: yes. For all the metrics and losses y_pred and y_true are mandatory arguments. Some of them have also additional arguments (most of them with default values specified) e. g. ats
for metrics which specifies top n items of a slate taken into account while calculating a metric.
For some of the losses there are other mandatory arguements - when using ordinal loss you need to pass the number of ordinal values and for pointwise rmse number of unique ground truth values is required.
Take a notice that some arguments of the loss functions can drastically change form of the function e. g. weighing_scheme
for [lambdaLoss].(https://github.com/allegro/allRank/blob/master/allrank/models/losses/lambdaLoss.py)
If in doubts I encourage to take a look at our real-life configs (which corresponds to experiments from our paper) from the reproducibility guide: https://github.com/allegro/allRank/tree/master/reproducibility/configs
Best,
Mikołaj