Megvii-BaseDetection/cvpods

Problem about fp16

YunhaoLee opened this issue · 1 comments

Hi, when I don't open fp16, the program can run normally. When I set the ENABLED to true, the following problems will appear:

Traceback (most recent call last):
File "/home/cvpods/tools/train_net.py", line 109, in
args=(args,),
File "/home/cvpods/cvpods/engine/launch.py", line 56, in launch
main_func(*args)
File "/home/cvpods/tools/train_net.py", line 95, in main
runner.train()
File "/home/cvpods/cvpods/engine/runner.py", line 271, in train
super().train(self.start_iter, self.start_epoch, self.max_iter)
File "/home/cvpods/cvpods/engine/base_runner.py", line 85, in train
self.after_step()
File "/home/cvpods/cvpods/engine/base_runner.py", line 115, in after_step
h.after_step()
File "/home/cvpods/cvpods/engine/hooks.py", line 148, in after_step
with amp.scale_loss(losses, self.trainer.optimizer) as scaled_loss:
File "/home/anaconda3/lib/python3.7/contextlib.py", line 112, in enter
return next(self.gen)
File "/home/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 82, in scale_loss
raise RuntimeError("Invoked 'with amp.scale_loss, but internal Amp state has not been initialized. " RuntimeError: Invoked 'with amp.scale_loss, but internal Amp state has not been initialized. model, optimizer = amp.initialize(model, optimizer, opt_level=...) must be called before with amp.scale_loss.

Could you please give me some advice.

OK, I've solved it. Just remove the limit on the number of cards on amp.initialize in runner