DXOMARK-Research/PIQ2023

Evaluation scripts

Closed this issue · 13 comments

Hello, thank for your work.
Can you public you JOD evaluation code, or a simply method to calculate this scores like your Table 1 in paper?

Hello,

We will be publishing the results very soon! To calculate the ranked correlation (SROCC), you need to calculate the correlation per scene, and then average the srocc of all scenes. Please use the official splits (device split and test split) that we shared in this repo. Beware that the results of the paper are not exactly reproducible on this version of piq23, since when the paper was written piq23 was still in the process of annotation.

Best,

Nicolas

@nicolasch96 thank for your quick response.
One more question:

Two images are 1 JOD apart if 75% of observers choose one as better than the other.

So how to convert JOD into float like in csv? Device by number of comparison?

Hello,

The jod is already generated in the csv. To do so, you need to use the trueskill implementation presented in this repo: https://github.com/gfxdisp/asap. If you want to use the full statistical analysis you have to use our algorithm which will also be shared in the near future.

Please do not hesitate to ask other questions that you may have.

Nicolas

@nicolasch96 I mean with "Two images are 1 JOD apart if 75% of observers choose one as better than the other.", JOD should be integer, not float like csv.

Please refer to this paper for more clarification about JOD: https://arxiv.org/abs/1712.03686

@nicolasch96 How you deal with multiple crop sizes (672, 448, and 224)? In HyperIQA, Local Distortion Aware Module use a AvgPooling layer follow by a Linear layer with fixed size, so it will not work when size change.

Excellent question, I again apologize for the delay in publishing SEM-HyperIQA. One solution we have is to only load the pretrained backbone of Resnet50, and then creating the LDA accordingly to the input size. So these layers will NOT be loaded with the pretrained model. Only for 224 patches I suppose since this was the ones used by HyperIQA if I am not wrong. This is unfortunate since we cannot benefit from the whole pretrained HyperIQA model.

@nichahine so you used 3 LADs and 3 Content Understanding Hyper Networks?

We used exactly the same architecture from this repo: https://github.com/SSL92/hyperIQA . We adapted it however to get multiple input sizes. I will try to accelerate the code and push it ASAP!

@nicolasch96 Did you modify or remove last Linear layer of LAD?

You can either use GAP instead of averagepool2d and not worry about the layer sizes or have to introduce patchSize as input and manipulate the layer sizes.

@nichahine thank you so much. I'm trying to implement your paper, so thank you alot. Sorry about my bad English. I will ask other questions soon.

If this issue is good, I will close it. Try to open a new issue for further questions, it is easier to index in this way for people with similar issues. Good luck with the implementation and thank you a lot!