Different image inferences with same result
arvisioncode opened this issue · 4 comments
Hi,
Im doing different tests with demo/clipiqa_single_image_demo.py
and the attribute_list = ['Quality', 'Brightness', 'Sharpness', 'Noisiness', 'Colorfulness', 'Contrast']
.
First, I’ve seen that fitting a good size to the input image is essential because in some cases the result is NaN. Is there a fixed size that should be adjusted in the inbound image?
In the tests I’m doing, regardless of the resize, I see that it doesn’t matter the input image the result is always the same, you know what might be?
Example:
MSI@DESKTOP-FEG9P7H MINGW64 /e/2. Projects/Image Quality/CLIP-IQA (v2-3.8)
$ python demo/clipiqa_single_image_demo.py --config configs/clipiqa/clipiqa_attribute_test.py --file_path dataset/good_1.jpg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
['Quality', 'Brightness', 'Sharpness', 'Noisiness', 'Colorfulness', 'Contrast', 'Quality']
[0.9892578 0.98876953 0.99853516 0.06512451 0.74316406 0.66796875]
(clipiqa)
MSI@DESKTOP-FEG9P7H MINGW64 /e/2. Projects/Image Quality/CLIP-IQA (v2-3.8)
$ python demo/clipiqa_single_image_demo.py --config configs/clipiqa/clipiqa_attribute_test.py --file_path dataset/bad_3.jpeg
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
['Quality', 'Brightness', 'Sharpness', 'Noisiness', 'Colorfulness', 'Contrast', 'Quality']
[0.9892578 0.98876953 0.99853516 0.06512451 0.74316406 0.66796875]
(clipiqa)
MSI@DESKTOP-FEG9P7H MINGW64 /e/2. Projects/Image Quality/CLIP-IQA (v2-3.8)
$ python demo/clipiqa_single_image_demo.py --config configs/clipiqa/clipiqa_attribute_test.py --file_path dataset/blur_2.tif
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
['Quality', 'Brightness', 'Sharpness', 'Noisiness', 'Colorfulness', 'Contrast', 'Quality']
[0.9892578 0.98876953 0.99853516 0.06512451 0.74316406 0.66796875]
(clipiqa)
I was testing in a jupyter notebook and encountered similar issuse as yours before. I found that's becauset my GPU CUDA memory was not enough. Note my image size is about 1024p. When I was using larger image like 4032p it giave me results with all 0.5.
I did not test on an image beyond 2K. But I guess you need to resize the input to avoid too large resolution, e.g., beyond 2K. The main reason, I guess, is that CLIP is only trained on 224x224 and too large image resolutions lead to diverse receptive fields for the network, which should affect the performance of the network.
Hi IceClear,
I found the SPAQ dataset have many images that beyond 2K, such as 5488x4112, 4032x3024, 4000x3000 etc, and from your experiments the SROCC/PLCC of SPAQ was very high. Did this mean the size of pretrained image was not the limitation?
Hi IceClear, I found the SPAQ dataset have many images that beyond 2K, such as 5488x4112, 4032x3024, 4000x3000 etc, and from your experiments the SROCC/PLCC of SPAQ was very high. Did this mean the size of pretrained image was not the limitation?
We resized the images in SPAQ. You can find it in our paper. #10