rabbityl/lepard

Some questions about training process

littlewater3 opened this issue · 1 comments

Hi, I'd like to ask some questions about the training process,

  1. Do you use only one video card during training?
  2. How much time does it take to train for an epoch?
  3. In configs/train/3Dmatch.yaml, your batch_size = 8, num_worker = 16. I use a 3090 video card. When I use Batch_size = 2 and num_worker =4, I will report an error of insufficient video memory of the video card. Therefore, only batch_size = 1 can be used for training
  4. Also, your max_epoch is 1500, so do you need to train for 1500 epochs?
    Thank you very much for your help.

We use a single A100 80G for training. If you do not have enough gpu mem, just set batch_size to 1. We actually terminate the training around 15-20 epochs, which take 1-2 days on A100.