About the evaluation on different datasets

Question

About the evaluation on different datasets

Closed this issue 8 months ago · 1 comments

Hi,

Thanks for your great work. In your paper, you use the pairwise learning-to-rank training strategy to train one model on different dataset. In this case, the predicted MOS is a relative score, how do you obtain the absolute score on different datasets? If not aligned properly, the PLCC metric shall be very low. Thanks!

Answer 1 · 2023-09-13T09:11:34.000Z

Hi, thanks for your interest in our work!

We use the pairwise learning-to-rank method to train a model on multiple datasets simultaneously. As such, the quality scores predicted by the resultant model are dataset-agnostic. In other words, the resultant model is actually attempting to unify the perceptual scales of different datasets into a common space, where the images from different datasets can be compared according to the predicted scores. Therefore, the predicted MOS is not a relative score.

To compute the PLCC results, we follow the common practice to learn a 4-parameter logistic nonlinear mapping function for each dataset (see the compute_metrics function in BIQA_benchmark.py).