Tranining problems

Question

Closed this issue 2 years ago · 1 comments

Hey @wyhsirius,
I was training the model on 4gpus, Have you met the following problem:

When I directly train start from 0,
I can use batch_size=32 to train the model without any problem,
However, when I want to train the model with --resume_ckpt, it shows like below, and I can just use very small batch size to avoid the out of memory problem :

I would appreciate it if you can share me some suggestion to solve this problem~

Bests,

Answer 1 · 2022-08-03T14:04:02.000Z

Hey guys, if you have the same problem. just change

 ckpt = torch.load(resume_ckpt)

to

 ckpt = torch.load(resume_ckpt, map_location='cpu')

in the trainer.py file.