wyhsirius/LIA

Tranining problems

Closed this issue · 1 comments

Hey @wyhsirius,
I was training the model on 4gpus, Have you met the following problem:

  1. When I directly train start from 0,
    I can use batch_size=32 to train the model without any problem,
    image

  2. However, when I want to train the model with --resume_ckpt, it shows like below, and I can just use very small batch size to avoid the out of memory problem :
    image

I would appreciate it if you can share me some suggestion to solve this problem~

Bests,

Hey guys, if you have the same problem. just change

 ckpt = torch.load(resume_ckpt)

to

 ckpt = torch.load(resume_ckpt, map_location='cpu')

in the trainer.py file.