Trying to reproduce the main table from the paper, the result always off especially the one from Imagenet-A

Question

Trying to reproduce the main table from the paper, the result always off especially the one from Imagenet-A

Closed this issue a year ago · 2 comments

I'm trying to reproduce this part of the main table.

However, the result alsways seems to be off. Especially the score from ImageNet-A, which always lying around 20-22

Here is the result from 3 seeds, using the same version of given dependencies (python 3.8)

	val_top1	val_top5	imagenet-a_top1	imagenet-a_top5	imagenet-r_top1	imagenet-r_top5	sketch_top1	sketch_top5	imagenetv2-matched-frequency-format-val_top1	imagenetv2-matched-frequency-format-val_top5	imagenet-style_top1	imagenet-style_top5
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1	81.69	96.078	20.787	43.987	35.233	50.2	35.788	57.684	71.17	90.49	17.842	31.726
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27	81.586	96.066	21.147	44.227	35.053	49.967	35.56	57.399	71.28	90.45	17.656	31.644
vitb_robustvit_environment_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42	81.598	96.088	20.653	43.933	35.26	49.923	35.825	57.682	71.41	90.29	17.78	31.634

Here is the result from the same 3 seeds, using different version of dependencies (similar results from above)

	val_top1	val_top5	imagenet-a_top1	imagenet-a_top5	imagenet-r_top1	imagenet-r_top5	sketch_top1	sketch_top5	imagenetv2-matched-frequency-format-val_top1	imagenetv2-matched-frequency-format-val_top5	imagenet-style_top1	imagenet-style_top5
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_1	81.676	96.13	18.36	41.08	34.863	49.9	35.803	57.893	71.31	90.37	17.388	31.048
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_27	81.63	96.108	20.56	43.587	35.21	50.053	35.827	57.661	71.35	90.35	17.676	31.65
vitb_robustvit_seed_bckg_2.0_fgd_0.3_num_epochs_50_seed_42	81.66	96.108	20.013	42.84	35.27	49.93	35.837	57.832	71.2	90.29	17.708	31.476

Here is the setting I used
{
"data": "Dataset/CV/imagenet/train",
"seg_data": "work/data/general/imagenet-s/ImageNetS919/train-semi-segmentation",
"workers": 4,
"epochs": 50,
"start_epoch": 0,
"batch_size": 8,
"lr": 3e-06,
"momentum": 0.9,
"weight_decay": 0.0001,
"print_freq": 10,
"resume": "",
"evaluate": false,
"pretrained": false,
"world_size": -1,
"rank": -1,
"dist_url": "tcp://224.66.41.62:23456",
"dist_backend": "nccl",
"gpu": 1,
"save_interval": 20,
"num_samples": 3,
"multiprocessing_distributed": false,
"lambda_seg": 0.8,
"lambda_acc": 0.2,
"experiment_folder": "experiment/vitb_robustvit_environment_seed/lr_3e-06_seg_0.8_acc_0.2_bckg_2.0_fgd_0.3_num_epochs_50_seed_1",
"dilation": 0,
"lambda_background": 2.0,
"lambda_foreground": 0.3,
"num_classes": 500,
"temperature": 1.0,
"class_seed": 1, # or 27, 42
"folder_name": "vitb_robustvit_environment_seed"
}

I used model_best.pth.tar to make an evaluation. Anything I should do or try to make the result closer to the paper?

Answer 1 · 2024-01-14T08:13:16.000Z

Hi @wanburana, thanks for your interest!
Were you able to reproduce the results on the original, unmanipulated model?
Perhaps it’s an issue with the dataset version you’re using?

Answer 2 · 2024-01-17T10:00:09.000Z

Hi @hila-chefer, here is the result from original pretrained model from ViT-B

compared to the reported results

it could be that the dataset or pytorch version is different. Thank you