Sam1224/SCCAN

some questions

Codeganef opened this issue · 11 comments

Hello, and thank you for your kind words. While carefully reviewing the code, I noticed that the handling of support patches doesn't seem to align with what the paper describes as self-attention. Instead, it appears to resemble self-calibrated cross-attention, similar to how query patches are treated.Did I misunderstand?
At the same time, I would like to ask if you can provide the training time required for each stage of the project, depending on your training environment, thank you!

Sorry, I made a mistake in the first question, the code corresponds to the paper, there is no problem, just answer the second question

Thanks for your interests in our work!

What do you mean by saying ''each stage''? Our model does not have multiple stages, e.g., if you want to train a 1-shot model on fold 0 of PASCAL-5i, with ResNet50 as the backbone, you just need to execute the following command:

python train_sccan.py --config=config/pascal/pascal_split0_resnet50.yaml

We use 1 32G V100 card to train each model related to PASCAL-5i dataset, and use 4 24G 3090 cards for COCO-20i dataset.
Sorry that we did not save the training logs, so the exact training time is not recorded.
If I remember it correctly, under 1-shot, ResNet50 setting, the training time of each fold for PASCAL-5i (1 V100) and COCO-20i (4 3090s) are about 1 and 2-3 days, respectively.

Hello, I still have some questions. Using the same environment to reproduce your work, I found that the results of each repetition are different, is it the reason of random number seeds?

Hello, I think random seed should not be the reason, which is fixed in our experiments (''manual_seed'' in .yaml files), does your result differ much? It would be reasonable if they are slightly different, because we also used an A100 card to re-run experiments on 1-shot, ResNet50, PASCAL-5i, and the mIoU of 4 folds was about 0.1% lower than the reported values.
I'm not hundred percent sure, but some possible reasons might be: different versions of libraries, hardwares or other system-level things.

Hello, I think random seed should not be the reason, which is fixed in our experiments (''manual_seed'' in .yaml files), does your result differ much? It would be reasonable if they are slightly different, because we also used an A100 card to re-run experiments on 1-shot, ResNet50, PASCAL-5i, and the mIoU of 4 folds was about 0.1% lower than the reported values. I'm not hundred percent sure, but some possible reasons might be: different versions of libraries, hardwares or other system-level things.

So far, I have tried three times, the first time the result is about the same as the paper, the second time is 1% lower than the paper, and the third time is 2% lower than the paper. The random number seeds used in these three experiments are the same. At the same time, I found that in the test stage, you also fixed the random number seeds and conducted only one experiment. If you used other random number seeds in the test stage, conducted 5 experiments and de-averaged the 5 experiments, have you ever tried such an attempt?

(1) I have no idea about this phenomenon, and we never met this before. I just start training model (fold 0, 1-shot, ResNet50, PASCAL-5i) on another server that had not been used for studying FSS before. I will get back to you once the training is done.
(2) It is true that using different random seeds during testing would make the results more robust (like BAM does). However, most of existing FSS works (e.g., PFENet, CyCTR) only report the results of 1 random seed in their papers, and we simply follow them. Besides, we provided experimental results of COCO-20i with 4,000 testing episodes (in Table 2) and 20,000 testing episodes (in Appendix A.4).

(1) I have no idea about this phenomenon, and we never met this before. I just start training model (fold 0, 1-shot, ResNet50, PASCAL-5i) on another server that had not been used for studying FSS before. I will get back to you once the training is done. (2) It is true that using different random seeds during testing would make the results more robust (like BAM does). However, most of existing FSS works (e.g., PFENet, CyCTR) only report the results of 1 random seed in their papers, and we simply follow them. Besides, we provided experimental results of COCO-20i with 4,000 testing episodes (in Table 2) and 20,000 testing episodes (in Appendix A.4).

Thanks for your reply, I am trying to repeat the experiment myself. At the same time, I have a question. I used the trained model to test, but I changed the random number seeds during the test, and got different results, some of which were even 1% better than the original ones. I would like to ask what caused the difference in results? What does this random seed perform random operations on?

The data loader would be affected. Recall that:
1 test episode = 1 query image + k support images + k support masks (k is from k-shot)

When you change the random seed, the following things would change:

  • The selected query images (You can refer to Line 208-211 of test_sccan.py for the details of selecting test samples).
    • For PASCAL-5i, the total number of test images (of each fold) is less than 1,000 (Line 187). Following existing works, we repetitively iterate the data loader (Line 208) until 1,000 episodes are tested. Thus, some query images might be tested multiple times (but with different randomly selected support pairs).
    • For COCO-20i, the total number of test images (of each fold) is more than 4,000 (Line 190). Therefore, random seeds determine which 4,000 images would be randomly selected for testing.
  • The support samples randomly selected for segmenting each query image would be changed.
    • Intuitively, when a query image and its support pairs look similar (e.g., 2 cats, with similar appearance), the metrics would be good; otherwise, the metrics would be relatively poor.

Thank you for your reply, here I also provide some reproduced results: The values of split0/split1/split2 under Pascal 1-shot were 67.46/72.48/65.46 respectively. The first two results were similar to those in the paper, but the results under split2 were quite different from those in the paper. Therefore, I conducted three more experiments under the setting of split2. Among them, only one experiment can achieve the results of the paper, and the remaining two experiments are about 65.5. Looking forward to your results!

Hello, in the random number seed part of the code, I found a parameter setting args.seed_deterministic=False, if the setting is changed to True, is it more stable? Why not set it to True?
image
image

Thanks for pointing this out. Our project was built upon BAM, from which this part was directly taken. We agree it would be better to set cudnn.deterministic as True, but we mistakenly ignored this part when conducting experiments (and BAM just set args.seed_deterministic to False).
This parameter might be the reason to your previous mentioned unstable training results (with multiple attempts). You may have a try when setting it to True and see if the learned models (with multiple attempts) would be uniform or not.