UCSC-VLAA/DMAE

How to reproduce the accuracy in Tab.5?

LiuDongyang6 opened this issue · 1 comments

Thank you for this work!

I am trying to reproduce the experiment results in Tab.5, and have met some trouble.

With 100 epoch pretrain and 100 epoch finetune, for the small(student)-base(teacher) experiment, we only get 75.27% top-1 accuracy, which is clearly lower than reported (79.3). In fact, we cannot reproduce the MAE baselines either. For the tiny(student)-base(teacher) experiment, after excluding the feature distillation loss from the pretrain loss function (i.e., only mae loss is retained), we only get 60.88% top-1 accuracy, while the reported is 66.6%. Could you help us solve this problem?

The following is the script for our small(student)-base(teacher) experiment
pretrain:

python -m torch.distributed.launch --nproc_per_node=8 \
    --use_env main_distill.py \
    --output_dir ./outputs/"$exp_name"/ckpt \
    --log_dir ./outputs/"$exp_name"/log \
    --batch_size 256 \
    --accum_iter 2 \
    --model mae_vit_small_patch16_dec512d8b \
    --model_teacher mae_vit_base_patch16_dec512d8b \
    --mask_ratio 0.75 \
    --epochs 100 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --data_path ${IMAGENET_DIR} \
    --teacher_model_path 'mae_visualize_vit_base.pth' \
    --student_reconstruction_target 'original_img' \
    --aligned_blks_indices 8 \
    --teacher_aligned_blks_indices 8 \
    --embedding_distillation_func L1 \
    --aligned_feature_projection_dim 384 768

fine-tune:

python -m torch.distributed.launch --nproc_per_node=8 main_finetune.py \
    --batch_size 128 \
    --model vit_small_patch16 \
    --finetune ./outputs/"$exp_name"/pretrain/ckpt/checkpoint-99.pth \
    --epochs 100 \
    --output_dir ./outputs/"$exp_name"/"$finetune_name"/ --log_dir ./outputs/"$exp_name"/"$finetune_name"/ \
    --blr 5e-4 \
    --weight_decay 0.05 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path /dev/shm/imagenet \
    &>outputs/"$exp_name"/"$finetune_name"/output.log

Note that for finetuning we changed the warmup epochs from 20 in the repo to 5, but we think it's not likely to make such a large difference.

Hi, thanks for your interest in this paper and repo. The original MAE paper does not offer specialized recipes for ViT-Small and Tiny, However, we found the recipe for smaller model includes strong regularization and augmentation techniques that might lead to over-regularization for the smaller ViTs. To address this issue, we experiment with a modified recipe with weaker augmentation and regularization. We will update this version of checkpoints and logs later in the repo too. Thanks!