Trying to reproduce the main table from the paper, the result always off especially the one from Imagenet-A
Closed this issue · 2 comments
I'm trying to reproduce this part of the main table.
However, the result alsways seems to be off. Especially the score from ImageNet-A, which always lying around 20-22
Here is the result from 3 seeds, using the same version of given dependencies (python 3.8)
val_top1 | val_top5 | imagenet-a_top1 | imagenet-a_top5 | imagenet-r_top1 | imagenet-r_top5 | sketch_top1 | sketch_top5 | imagenetv2-matched-frequency-format-val_top1 | imagenetv2-matched-frequency-format-val_top5 | imagenet-style_top1 | imagenet-style_top5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 | 81.69 | 96.078 | 20.787 | 43.987 | 35.233 | 50.2 | 35.788 | 57.684 | 71.17 | 90.49 | 17.842 | 31.726 |
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 | 81.586 | 96.066 | 21.147 | 44.227 | 35.053 | 49.967 | 35.56 | 57.399 | 71.28 | 90.45 | 17.656 | 31.644 |
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 | 81.598 | 96.088 | 20.653 | 43.933 | 35.26 | 49.923 | 35.825 | 57.682 | 71.41 | 90.29 | 17.78 | 31.634 |
Here is the result from the same 3 seeds, using different version of dependencies (similar results from above)
val_top1 | val_top5 | imagenet-a_top1 | imagenet-a_top5 | imagenet-r_top1 | imagenet-r_top5 | sketch_top1 | sketch_top5 | imagenetv2-matched-frequency-format-val_top1 | imagenetv2-matched-frequency-format-val_top5 | imagenet-style_top1 | imagenet-style_top5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1 | 81.676 | 96.13 | 18.36 | 41.08 | 34.863 | 49.9 | 35.803 | 57.893 | 71.31 | 90.37 | 17.388 | 31.048 |
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27 | 81.63 | 96.108 | 20.56 | 43.587 | 35.21 | 50.053 | 35.827 | 57.661 | 71.35 | 90.35 | 17.676 | 31.65 |
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42 | 81.66 | 96.108 | 20.013 | 42.84 | 35.27 | 49.93 | 35.837 | 57.832 | 71.2 | 90.29 | 17.708 | 31.476 |
Here is the setting I used
{
"data": "Dataset/CV/imagenet/train",
"seg_data": "work/data/general/imagenet-s/ImageNetS919/train-semi-segmentation",
"workers": 4,
"epochs": 50,
"start_epoch": 0,
"batch_size": 8,
"lr": 3e-06,
"momentum": 0.9,
"weight_decay": 0.0001,
"print_freq": 10,
"resume": "",
"evaluate": false,
"pretrained": false,
"world_size": -1,
"rank": -1,
"dist_url": "tcp://224.66.41.62:23456",
"dist_backend": "nccl",
"gpu": 1,
"save_interval": 20,
"num_samples": 3,
"multiprocessing_distributed": false,
"lambda_seg": 0.8,
"lambda_acc": 0.2,
"experiment_folder": "experiment/vitb_robustvit_environment_seed/lr_3e-06_seg_0.8_acc_0.2_bckg_2.0_fgd_0.3_num_epochs_50_seed_1",
"dilation": 0,
"lambda_background": 2.0,
"lambda_foreground": 0.3,
"num_classes": 500,
"temperature": 1.0,
"class_seed": 1, # or 27, 42
"folder_name": "vitb_robustvit_environment_seed"
}
I used model_best.pth.tar
to make an evaluation. Anything I should do or try to make the result closer to the paper?
Hi @wanburana, thanks for your interest!
Were you able to reproduce the results on the original, unmanipulated model?
Perhaps it’s an issue with the dataset version you’re using?
Hi @hila-chefer, here is the result from original pretrained model from ViT-B
compared to the reported results
it could be that the dataset or pytorch version is different. Thank you