
Trying to reproduce the main table from the paper, the result always off especially the one from Imagenet-A

Closed this issue · 2 comments

I'm trying to reproduce this part of the main table.
However, the result alsways seems to be off. Especially the score from ImageNet-A, which always lying around 20-22

Here is the result from 3 seeds, using the same version of given dependencies (python 3.8)

  val_top1 val_top5 imagenet-a_top1 imagenet-a_top5 imagenet-r_top1 imagenet-r_top5 sketch_top1 sketch_top5 imagenetv2-matched-frequency-format-val_top1 imagenetv2-matched-frequency-format-val_top5 imagenet-style_top1 imagenet-style_top5
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 81.69 96.078 20.787 43.987 35.233 50.2 35.788 57.684 71.17 90.49 17.842 31.726
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 81.586 96.066 21.147 44.227 35.053 49.967 35.56 57.399 71.28 90.45 17.656 31.644
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 81.598 96.088 20.653 43.933 35.26 49.923 35.825 57.682 71.41 90.29 17.78 31.634

Here is the result from the same 3 seeds, using different version of dependencies (similar results from above)

  val_top1 val_top5 imagenet-a_top1 imagenet-a_top5 imagenet-r_top1 imagenet-r_top5 sketch_top1 sketch_top5 imagenetv2-matched-frequency-format-val_top1 imagenetv2-matched-frequency-format-val_top5 imagenet-style_top1 imagenet-style_top5
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 81.676 96.13 18.36 41.08 34.863 49.9 35.803 57.893 71.31 90.37 17.388 31.048
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 81.63 96.108 20.56 43.587 35.21 50.053 35.827 57.661 71.35 90.35 17.676 31.65
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 81.66 96.108 20.013 42.84 35.27 49.93 35.837 57.832 71.2 90.29 17.708 31.476

Here is the setting I used
"data": "Dataset/CV/imagenet/train",
"seg_data": "work/data/general/imagenet-s/ImageNetS919/train-semi-segmentation",
"workers": 4,
"epochs": 50,
"start_epoch": 0,
"batch_size": 8,
"lr": 3e-06,
"momentum": 0.9,
"weight_decay": 0.0001,
"print_freq": 10,
"resume": "",
"evaluate": false,
"pretrained": false,
"world_size": -1,
"rank": -1,
"dist_url": "tcp://",
"dist_backend": "nccl",
"gpu": 1,
"save_interval": 20,
"num_samples": 3,
"multiprocessing_distributed": false,
"lambda_seg": 0.8,
"lambda_acc": 0.2,
"experiment_folder": "experiment/vitb_robustvit_environment_seed/lr_3e-06_seg_0.8_acc_0.2_bckg_2.0_fgd_0.3_num_epochs_50_seed_1",
"dilation": 0,
"lambda_background": 2.0,
"lambda_foreground": 0.3,
"num_classes": 500,
"temperature": 1.0,
"class_seed": 1, # or 27, 42
"folder_name": "vitb_robustvit_environment_seed"

I used model_best.pth.tar to make an evaluation. Anything I should do or try to make the result closer to the paper?

Hi @wanburana, thanks for your interest!
Were you able to reproduce the results on the original, unmanipulated model?
Perhaps it’s an issue with the dataset version you’re using?

Hi @hila-chefer, here is the result from original pretrained model from ViT-B

compared to the reported results

it could be that the dataset or pytorch version is different. Thank you