dingjiansw101/ZegFormer

Fail to reproduce the results on COCO dataset.

Harry-zzh opened this issue · 13 comments

Thank you for your excellent work.

However, I failed to reproduce the results on COCO dataset reported in your paper. I have followed your experimental settings precisely, yet my results are significantly different.
seen_IoU: 30.3 unseen_IoU: 32.8

I am therefore inquiring as to the possible reasons for this discrepancy. Thank you in advance for your assistance!

Hi, can you reproduce the results with our provided model weights?

Which config do you use?

Yes, I have tried your provided zegformer_R101_bs32_60k_vit16_coco-stuff.pth. And the results are:
seen_IoU: 36.6 unseen_IoU: 36.2
where the unseen IoU is even a bit higher than yours reported in your paper.

My commands for training is:
python ./train_net.py --num-gpus 8 --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff.yaml

for evaluation is:
python ./train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff_gzss_eval.yaml --eval-only MODEL.WEIGHTS outputs/model_final.pth

I wonder whether this discrepancy could be attributed to training or evaluating on different types of machines. I share my final model checkpoint here. I would greatly appreciate it if you could take the time to evaluate this checkpoint, to see whether our evaluation results are different.

I do not think machines will make a difference. Maybe I changed the code, I will check it later. Can you train it multiple times first? It may have some variations.

Yes. I have trained it twice. And the evaluation result of the second are:
seen_IoU: 30.5 unseen_IoU: 29.5

It seems that there is a small bug in the released code. The PROMPT_ENSEMBLE_TYPE for training config is "imagenet_select", while the testing config is "single". You can try to make is consistent. Please let me know if you get the correct results after fix this bug.

I will also check it later. But in the 2 weeks, I do not have time.

When I set the PROMPT_ENSEMBLE_TYPE to "imagenet_select" for both the training config and testing config, I got the evaluation result:
seen_IoU: 34.5 unseen_IoU: 33.6

The result seems to be more normal, although the mIoU of the seen classes are 2 points lower than yours. And thank you for your assistance again!

You can also try to set training and testing as "single". The released checkpoint is trained with "single".

Hi, I trained with "single," but the results are still not good:

seen: 34.70, unseen: 32.03

May I check if there is any follow-up on this issue?

  1. There may exist variations. Have you tried to train it multiple times?
  2. You can try to train with "single" and test with "imagenet_select".

I am very busy these days. I will check it by myself when I am free.

Hi, I trained with "single," but the results are still not good:

seen: 34.70, unseen: 32.03

May I check if there is any follow-up on this issue?

Hello, what is the batch size that you use?
@wangkaihong

@wangkaihong @Harry-zzh
Sorry for the late reply, I just have the time for this issue. I tried it by myself and got results:
seen: 36.7655, unseen: 34.6907, harmonic_mean:35.6980

The results are slightly higher than the results reported in the paper.

I use 4 GPUs with a batch size of 32.

I set PROMPT_ENSEMBLE_TYPE: "single" during both training and testing.

The following are my commands:
python train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff.yaml --num-gpus 4 OUTPUT_DIR work_dirs/zegformer_R101_bs32_60k_vit16_coco-stuff

python train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff_gzss_eval.yaml --num-gpus 4 --eval-only MODEL.WEIGHTS work_dirs/zegformer_R101_bs32_60k_vit16_coco-stuff/model_final.pth

@wangkaihong @Harry-zzh
I tried again with 8 GPUs and a batch size of 32. Other settings are the same.
I got the results:
seen: 36.0161, unseen: 33.894, harmonic_mean: 34.923

The results are also very close to the reported results in the paper.

Could you give me more details about your experiments? So that I can help you to debug.