How to reproduce the accuracy in Tab.5?
LiuDongyang6 opened this issue · 1 comments
Thank you for this work!
I am trying to reproduce the experiment results in Tab.5, and have met some trouble.
With 100 epoch pretrain and 100 epoch finetune, for the small(student)-base(teacher) experiment, we only get 75.27% top-1 accuracy, which is clearly lower than reported (79.3). In fact, we cannot reproduce the MAE baselines either. For the tiny(student)-base(teacher) experiment, after excluding the feature distillation loss from the pretrain loss function (i.e., only mae loss is retained), we only get 60.88% top-1 accuracy, while the reported is 66.6%. Could you help us solve this problem?
The following is the script for our small(student)-base(teacher) experiment
pretrain:
python -m torch.distributed.launch --nproc_per_node=8 \
--use_env main_distill.py \
--output_dir ./outputs/"$exp_name"/ckpt \
--log_dir ./outputs/"$exp_name"/log \
--batch_size 256 \
--accum_iter 2 \
--model mae_vit_small_patch16_dec512d8b \
--model_teacher mae_vit_base_patch16_dec512d8b \
--mask_ratio 0.75 \
--epochs 100 \
--blr 1.5e-4 --weight_decay 0.05 \
--data_path ${IMAGENET_DIR} \
--teacher_model_path 'mae_visualize_vit_base.pth' \
--student_reconstruction_target 'original_img' \
--aligned_blks_indices 8 \
--teacher_aligned_blks_indices 8 \
--embedding_distillation_func L1 \
--aligned_feature_projection_dim 384 768
fine-tune:
python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
--batch_size 128 \
--model vit_small_patch16 \
--finetune ./outputs/"$exp_name"/pretrain/ckpt/checkpoint-99.pth \
--epochs 100 \
--output_dir ./outputs/"$exp_name"/"$finetune_name"/ --log_dir ./outputs/"$exp_name"/"$finetune_name"/ \
--blr 5e-4 \
--weight_decay 0.05 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path /dev/shm/imagenet \
&>outputs/"$exp_name"/"$finetune_name"/output.log
Note that for finetuning we changed the warmup epochs from 20 in the repo to 5, but we think it's not likely to make such a large difference.
Hi, thanks for your interest in this paper and repo. The original MAE paper does not offer specialized recipes for ViT-Small and Tiny, However, we found the recipe for smaller model includes strong regularization and augmentation techniques that might lead to over-regularization for the smaller ViTs. To address this issue, we experiment with a modified recipe with weaker augmentation and regularization. We will update this version of checkpoints and logs later in the repo too. Thanks!