R(2+1)D training time on Kinetics
elv-zhiyun opened this issue · 1 comments
Hi,
thank you for your work and pre-trained models!
I'm trying to train R(2+1)D on a custom dataset and It appears training takes a long time. In c2/tutorials/kinetics_train.md
it says "Training this model may take a few days with 8 P100 GPUs" -- is this model R(2+1)D-18 8-frame as specified in c2/scripts/train_r2plus1d_kinetics.sh
?
Could you share more detailed information about this? How much time/computing resources does it take to train one epoch on Kinetics, and what about deeper models like R(2+1)D-34, R(2+1)D-152, or 32-frame models?
Hi @elv-zhiyun, the training time varies on a million different parameters, such as the compute you have available, type of interconnects, data reading system, etc.
I found that training a model on Kinetics dataset, with 64 V100 gpus (connected via InfinityBand), and NFS attached storage takes about 24 hours. This is training it for 45 epochs with epoch multiplier of 5 using pytorch DDP.
I was able to train r2+1d-18 on 8GPU machine, with slow harddrive access and basic interconnects in about 6 days, but your milage may vary.