Reproducing Camera-Ready Improved Numbers

Question

Reproducing Camera-Ready Improved Numbers

dfan opened this issue 2 years ago · 4 comments

The NeurIPS camera ready version (v3 on arXiv) has some significantly higher results than the previous paper version (v2 on arXiv). E.g. for ViT-B pretrained on K400 for 1600 epochs, performance on K400 jumps from 80.9% to 81.5%. For ViT-B pretrained on SSv2 for 2400 epochs, performance on SSv2 jumps from 70.6% to 70.8%. Could the authors share the updated finetuning code and configs? I am unable to reproduce the new results. My results are close to what is reported in v2 of the paper

Answer 1 · 2022-11-30T17:00:11.000Z

We have fixed one issue in this commit and the performance on Kinetics-400 can be improved by about 0.5%.

Answer 2 · 2022-11-30T17:01:20.000Z

The results on Kinetics-400 can be reproduced successfully by MMAction2.

Answer 3 · 2022-12-02T15:56:31.000Z

Hm with the latest code I am getting 70.3% on SSv2 and 81.3% on K400 with the 2400 and 1600 epoch off-shelf pretrained weights provided in this repo. I am using this script and this script except I changed the batch size to meet my memory constraints. I am using 64 GPUs

--batch_size 2
--update_freq 3
--num_sample 2

Answer 4 · 2022-12-06T07:00:10.000Z

Hi @dfan! I think --batch_size 2 is too small to get favorable results. --update_freq 3 might be a trick to enlarge the batch-size, but I am not sure the performance can be reproduced.