Fail to reproduce the results on COCO dataset.

Question

Fail to reproduce the results on COCO dataset.

Harry-zzh opened this issue 2 years ago · 13 comments

Thank you for your excellent work.

However, I failed to reproduce the results on COCO dataset reported in your paper. I have followed your experimental settings precisely, yet my results are significantly different.
seen_IoU: 30.3 unseen_IoU: 32.8

I am therefore inquiring as to the possible reasons for this discrepancy. Thank you in advance for your assistance!

Answer 1 · 2023-03-05T15:18:27.000Z

Hi, can you reproduce the results with our provided model weights?

Which config do you use?

Answer 2 · 2023-03-06T01:25:22.000Z

Yes, I have tried your provided zegformer_R101_bs32_60k_vit16_coco-stuff.pth. And the results are:
seen_IoU: 36.6 unseen_IoU: 36.2
where the unseen IoU is even a bit higher than yours reported in your paper.

My commands for training is:
python ./train_net.py --num-gpus 8 --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff.yaml

for evaluation is:
python ./train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff_gzss_eval.yaml --eval-only MODEL.WEIGHTS outputs/model_final.pth

Answer 3 · 2023-03-06T02:27:14.000Z

I wonder whether this discrepancy could be attributed to training or evaluating on different types of machines. I share my final model checkpoint here. I would greatly appreciate it if you could take the time to evaluate this checkpoint, to see whether our evaluation results are different.

Answer 4 · 2023-03-09T20:53:32.000Z

I do not think machines will make a difference. Maybe I changed the code, I will check it later. Can you train it multiple times first? It may have some variations.

Answer 5 · 2023-03-13T01:45:37.000Z

Yes. I have trained it twice. And the evaluation result of the second are:
seen_IoU: 30.5 unseen_IoU: 29.5

Answer 6 · 2023-03-15T07:24:58.000Z

It seems that there is a small bug in the released code. The PROMPT_ENSEMBLE_TYPE for training config is "imagenet_select", while the testing config is "single". You can try to make is consistent. Please let me know if you get the correct results after fix this bug.

I will also check it later. But in the 2 weeks, I do not have time.

Answer 7 · 2023-03-17T11:22:47.000Z

When I set the PROMPT_ENSEMBLE_TYPE to "imagenet_select" for both the training config and testing config, I got the evaluation result:
seen_IoU: 34.5 unseen_IoU: 33.6

The result seems to be more normal, although the mIoU of the seen classes are 2 points lower than yours. And thank you for your assistance again!

Answer 8 · 2023-03-17T13:34:44.000Z

You can also try to set training and testing as "single". The released checkpoint is trained with "single".

Answer 9 · 2023-04-19T21:21:27.000Z

Hi, I trained with "single," but the results are still not good:

seen: 34.70, unseen: 32.03

May I check if there is any follow-up on this issue?

Answer 10 · 2023-04-20T01:38:29.000Z

There may exist variations. Have you tried to train it multiple times?
You can try to train with "single" and test with "imagenet_select".

I am very busy these days. I will check it by myself when I am free.

Hi, I trained with "single," but the results are still not good:

seen: 34.70, unseen: 32.03

May I check if there is any follow-up on this issue?

Answer 11 · 2023-05-03T08:17:15.000Z

Hello, what is the batch size that you use?
@wangkaihong

Answer 12 · 2023-05-06T03:19:21.000Z

@wangkaihong @Harry-zzh
Sorry for the late reply, I just have the time for this issue. I tried it by myself and got results:
seen: 36.7655, unseen: 34.6907, harmonic_mean:35.6980

The results are slightly higher than the results reported in the paper.

I use 4 GPUs with a batch size of 32.

I set PROMPT_ENSEMBLE_TYPE: "single" during both training and testing.

The following are my commands:
python train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff.yaml --num-gpus 4 OUTPUT_DIR work_dirs/zegformer_R101_bs32_60k_vit16_coco-stuff

python train_net.py --config-file configs/coco-stuff/zegformer_R101_bs32_60k_vit16_coco-stuff_gzss_eval.yaml --num-gpus 4 --eval-only MODEL.WEIGHTS work_dirs/zegformer_R101_bs32_60k_vit16_coco-stuff/model_final.pth

Answer 13 · 2023-05-07T03:23:30.000Z

@wangkaihong @Harry-zzh
I tried again with 8 GPUs and a batch size of 32. Other settings are the same.
I got the results:
seen: 36.0161, unseen: 33.894, harmonic_mean: 34.923

The results are also very close to the reported results in the paper.

Could you give me more details about your experiments? So that I can help you to debug.