RuntimeError: Found param embeddings.list_embedding_0.model.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
gly99999 opened this issue · 1 comments
gly99999 commented
你好,不好意思又要麻烦你看个问题,我把模型开启这个混合精度,我在train.py
添加了use_amp
和amp_opt_level
这两个参数,然后执行有下面这个问题,看错误应该是要把embeddings加载到gpu里吧,我后来把ReinforcementTrainer
中的train函数中的参数embeddings_storage_mode
改成了gpu
,看了代码之后感觉不是这里的问题,而且改成这个后,我这里的gpu也没那么大的内存。
然后我也调试看了,下面这个图片中的model,发现里面有的embeddings
是不在gpu里的,现在不知道是不是这个原因,如果要把embeddings全部加载到gpu,感觉内存也会不够的,再次麻烦你看一下了。
Defaults for this optimization level are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : 128.0
Traceback (most recent call last):
File "/home/gly21/python_workspace/ACE/flair/trainers/reinforcement_trainer.py", line 547, in train
self.model, optimizer, opt_level=amp_opt_level, loss_scale=128.0
File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_initialize.py", line 93, in check_params_fp32
name, param.type()))
File "/home/gly21/.conda/envs/gly_ace/lib/python3.7/site-packages/apex/amp/_amp_state.py", line 33, in warn_or_err
python-BaseException
raise RuntimeError(msg)
RuntimeError: Found param embeddings.list_embedding_0.model.embeddings.word_embeddings.weight with type torch.FloatTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.
Process finished with exit code 1
wangxinyu0922 commented
这块我没研究过,如果amp的输入需要self.model都在gpu上的话,建议在代码把所有句子的embedding encode好之后,直接把self.model.embeddings给删掉,然后再输入到amp里面。embedding_storage_mode最好不要改成gpu,这样子每句话预存的embedding都会以gpu形式存在数据里,gpu很容易不够。
不过如果删掉embedding的话,可能会导致验证和测试有问题,这个可能得你自己操作一下,比如说把self.model.embedding在删备份一下