MCG-NJU/VideoMAE

Reproducing Camera-Ready Improved Numbers

dfan opened this issue · 4 comments

dfan commented

The NeurIPS camera ready version (v3 on arXiv) has some significantly higher results than the previous paper version (v2 on arXiv). E.g. for ViT-B pretrained on K400 for 1600 epochs, performance on K400 jumps from 80.9% to 81.5%. For ViT-B pretrained on SSv2 for 2400 epochs, performance on SSv2 jumps from 70.6% to 70.8%. Could the authors share the updated finetuning code and configs? I am unable to reproduce the new results. My results are close to what is reported in v2 of the paper

We have fixed one issue in this commit and the performance on Kinetics-400 can be improved by about 0.5%.

The results on Kinetics-400 can be reproduced successfully by MMAction2.

dfan commented

Hm with the latest code I am getting 70.3% on SSv2 and 81.3% on K400 with the 2400 and 1600 epoch off-shelf pretrained weights provided in this repo. I am using this script and this script except I changed the batch size to meet my memory constraints. I am using 64 GPUs

--batch_size 2
--update_freq 3
--num_sample 2

Hi @dfan! I think --batch_size 2 is too small to get favorable results. --update_freq 3 might be a trick to enlarge the batch-size, but I am not sure the performance can be reproduced.