Some questions about training process
littlewater3 opened this issue · 1 comments
littlewater3 commented
Hi, I'd like to ask some questions about the training process,
- Do you use only one video card during training?
- How much time does it take to train for an epoch?
- In configs/train/3Dmatch.yaml, your batch_size = 8, num_worker = 16. I use a 3090 video card. When I use Batch_size = 2 and num_worker =4, I will report an error of insufficient video memory of the video card. Therefore, only batch_size = 1 can be used for training
- Also, your max_epoch is 1500, so do you need to train for 1500 epochs?
Thank you very much for your help.
rabbityl commented
We use a single A100 80G for training. If you do not have enough gpu mem, just set batch_size to 1. We actually terminate the training around 15-20 epochs, which take 1-2 days on A100.