zwx8981/LIQE

About the evaluation on different datasets

Closed this issue · 1 comments

Hi,

Thanks for your great work. In your paper, you use the pairwise learning-to-rank training strategy to train one model on different dataset. In this case, the predicted MOS is a relative score, how do you obtain the absolute score on different datasets? If not aligned properly, the PLCC metric shall be very low. Thanks!

Hi, thanks for your interest in our work!

We use the pairwise learning-to-rank method to train a model on multiple datasets simultaneously. As such, the quality scores predicted by the resultant model are dataset-agnostic. In other words, the resultant model is actually attempting to unify the perceptual scales of different datasets into a common space, where the images from different datasets can be compared according to the predicted scores. Therefore, the predicted MOS is not a relative score.

To compute the PLCC results, we follow the common practice to learn a 4-parameter logistic nonlinear mapping function for each dataset (see the compute_metrics function in BIQA_benchmark.py).