About training time and hardware
WangZhouTao opened this issue · 4 comments
Hi~
My machine is in-built 7700K CPU and a 1080ti GPU, it took 15 mins for one epoch training. It took more than 90 hours to complete training. Can you tell me about the training time and the GPU of your machine?
Hi, thank you for using our code. I am using a V100 GPU, and using an Intel(R) Xeon(R) CPU E5-2698. It usually takes me about 4~5 minutes for one epoch training for batch size 8. Takes about 16 hours to complete training or model converging (for scannet). During training, I believe there is a matching step in the code taking lots of cpu resources. The training does take some time. I usually train my model in a GPU server. Would you mind telling me what's the batch size you are using? For smaller GPUs, I usually use multi GPUs to maintain a batch size 8. With smaller batch size, the training takes longer time.
Hi, thank you for your reply.
The batch size of my machine is set to 2 (single 1080ti GPU).I will try to run this code on another machine, thank you.
No problem. I am closing this thread for now. Feel free to re-open it.
Hi~
My machine is in-built 7700K CPU and a 1080ti GPU, it took 15 mins for one epoch training. It took more than 90 hours to complete training. Can you tell me about the training time and the GPU of your machine?
Hi, thank you for using our code. I am using a V100 GPU, and using an Intel(R) Xeon(R) CPU E5-2698. It usually takes me about 4~5 minutes for one epoch training for batch size 8. Takes about 16 hours to complete training or model converging (for scannet). During training, I believe there is a matching step in the code taking lots of cpu resources. The training does take some time. I usually train my model in a GPU server. Would you mind telling me what's the batch size you are using? For smaller GPUs, I usually use multi GPUs to maintain a batch size 8. With smaller batch size, the training takes longer time.
sorry, I reconfirmed the training time. H3dnet takes about 8 minutes to train an epoch on my machine. Correct here.