taoyang1122/adapt-image-models

Training log available?

Closed this issue · 5 comments

Hi, thanks for the great work!

I wonder is there any training log available for clip pretrained models?

Hi @simonJJJ , thanks for your interest in our work. I may not be able to provide the training logs because I don't have access now.

The vitclip_large_k400 config is not consistent with the paper, i.e. training num_frames, training frame_interval, ColorJitter, backbone lr_mult, warmup epochs etc.

I simply run the vitclip_large_k400 config in your repo but get the top1 acc = 85.69. So I want to know the strictly correct config.

Thanks.

Hi, sorry we missed some implementation details in the paper. For ViT-L on K400, we use ColorJitter and 0.1x backbone lr to alleviate overfitting. I updated the config. You may try it again. The configs are for 8GPU batchsize=64. Another possible reason for the performance is that the K400 videos may be different.

Hi,

I directly evaluate your pretrained model ViT-L/14 32x3x1 on K400 by using the updated config that you fix.

However, I get the top1 acc = 86.23. Add the ThreeCrop for infer, the top1 acc = 86.69. The result is still far from the paper reported top1 acc=87.5.
My validation set has 19877 valid videos.

After email discussions with co-authors, it turns out the cause is the different validation set. With K400 val set from link, I can reproduce the ViTClip-L result with top-1 acc=87.3 and 87.61 (w/ ThreeCrop).

Hope it's helpful to others.