dvlab-research/PFENet

Problem about reproduced accuracy

Closed this issue · 5 comments

Hi, thanks for sharing the code! I have trained the model without any modification, but the results are always about 1% worse than the reported accuracy.
Here are some reproduced results with reported results on Pascalvoc dataset: Fold-0 60.8 (61.7), Fold-1 68.5 (69.5), Fold-2 53.9 (55.4)
So I wonder if I miss some tricks to reach the reported result? do I need to keep fine-tuning the model?

@Reagan1311 Hi, thanks for your interest. Extra fine-tuning is not needed. We followed the config to get the pre-trained models. The detailed training/test logs of split-0 of pascal-5i are as follows. Note that for validation efficiency, we only evaluated 2000 pairs during training but the model was still tested on 5000 pairs.

Have you successfully used the pre-trained models to get the results? Some users told me different results may be got because of different dataset pre-processing methods, e.g., some areas of the boundary region of PASCAL VOC should be labeled as 255. Also, different torch/CUDA versions may cause different training results. Hope this reply would help you.

Thank you.

training log: https://mycuhk-my.sharepoint.com/:u:/g/personal/1155122171_link_cuhk_edu_hk/EctF_PWtmKVHjtNFWTtM4s4BjaElf4ZREEkve0jfPQTT-A

testing log: https://mycuhk-my.sharepoint.com/:u:/g/personal/1155122171_link_cuhk_edu_hk/EU-IJWvzAS9DiFJP_n-MKZkBUKcNxZ1R7mqvNoT4V5D5cQ

@tianzhuotao Thanks for your reply. I can get the reported results by using the pre-trained models, so I think the dataset pre-processing methods work well. I guess my CUDA version is different from your setting (I use CUDA 10.1 and torch 1.4.0).
Besides, I want to ask will the multi-gpu training + sync_bn further boost the performance?

@Reagan1311 We followed CANet to build our base model whose BatchNorm layers were removed. I have tried adding BN layers to the decoder but no significant improvement was obtained. Possibly it was because of the shallow decoder structure that contains only several learnable layers.

Reproducibility is also what we concern about. My experiments are with CUDA 10.0 and torch 1.4.0 that are close to yours. I have re-run the models on all splits of Pascal-5i and the results will be updated as soon as possible.

Thank you.

@Reagan1311

We have reproduced the results with CUDA10.0 and Torch 1.4.0. The training/test logs and weights can be found at OneDrive.
We have got the results that are close to the results reported in the paper (5000 evaluation pairs): split0: 61.67; split1: 69.63; split2: 56.26; split3: 56.92.

Also, we provide the list of dependencies at python3_env.txt for reference.

We have run experiments on split-0 for three times to analyze the training variance. Even though we have used the same random seed (manual_seed: 321), the training variance is inevitable. Specifically, the best class mIoU results we got during training (2000 pairs for quick evaluation during training) are 61.44 (The model we provide at the link above), 60.76, and 61.53 respectively.

Thank you and hope this reply is helpful for your research.

Thanks for the reply, it's very helpful :)