About the demo
Opened this issue · 6 comments
Hello, I share the same confusion as you. I am using voc21 for testing. According to the test metrics, it aligns with the 59% mentioned in the paper. However, the results from individual images seem to be generally poor. This is my first time working with the CLIP model. May I ask what is meant by the "single image demo"? Does it not refer to the testing method currently provided?
It is a demo where the input is just a single image and the output is the segmented image.
I found that better optimization results can be obtained through PAMR. However, the visualization of the inner product results obtained directly from the image encoder and text embedding is not very satisfactory.
I found that better optimization results can be obtained through PAMR. However, the visualization of the inner product results obtained directly from the image encoder and text embedding is not very satisfactory.
Did you create any code to use PAMR? I have adjust the slide_steps and other hyperparameters but the result remained unsatisfactory.
You can carefully read the paper associated with this code, where it mentions the unfairness of comparing such tasks with PAMR, thus PAMR is not used by default. However, you can obtain results processed with PAMR by setting the hyperparameters for PAMR when calling SCLIP.