The voc results

Hi, that is an excellent work, thanks for sharing.

I have a question about the Table 2. in your paper. When i run the code to generate the Initial CAM, Initial + CAA and Initial + CAA + dCRF, it get 49.12, 70.7, 71.1 mIoU. Is this because different settings use different thresholds?
And maybe the different way to generate the voc2012 dataset causes this problem, so could you please release how to generate the voc2012 dataset?

Thanks for your reading. Your work is interesting!

Hi, thanks for your attention to our work.

The results in Table 2 are based on the train set (1464 images in total) rather than the train_aug set (10582 images in total). Did you set it correctly in --split_file? Besides, you don't need to manually set thresholds and the links to obtain images and gt masks for voc dataset have been provided in README. If you still fail to get the results, you can show us more details, like the running command, your revision of the code and the environment you used.

Hi, Thanks for your reply.

I reinstall the pydensecrf package and download the dataset again, it is work! I get the same mIoU as your paper.
By the way, is there any implementation about the Initial + MHSA in Table 2?

Thank you very much!

To implement the initial + MHSA, you can simply comment out or delete the following line:

CLIP-ES/generate_cams_voc12.py

Line 188 in 9094660

trans_mat = trans_mat * aff_mask

)

Thanks for your comment!