Input type different & Memory usage during training
hutchinsonian opened this issue · 2 comments
Thanks for this work!
When I train the model with python [EXP_PATH] --amp_backend native -b 8 --gpus 8
,
[EXP_PATH] is bevdepth/exps/nuscenes/mv/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da_ema.py
I got the following error:
Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
File "BEVDepth/bevdepth/layers/backbones/base_lss_fpn.py", line 310, in forward
x = self.reduce_conv(x)
File "BEVDepth/bevdepth/layers/backbones/base_lss_fpn.py", line 402, in _forward_voxel_net
self.depth_aggregation_net(img_feat_with_depth).view(
File "BEVDepth/bevdepth/layers/backbones/base_lss_fpn.py", line 533, in _forward_single_sweep
img_feat_with_depth = self._forward_voxel_net(img_feat_with_depth)
File "BEVDepth/bevdepth/layers/backbones/base_lss_fpn.py", line 593, in forward
key_frame_res = self._forward_single_sweep(
File "BEVDepth/bevdepth/models/base_bev_depth.py", line 56, in forward
x, depth_pred = self.backbone(x,
File "BEVDepth/bevdepth/exps/nuscenes/base_exp.py", line 239, in forward
return self.model(sweep_imgs, mats)
File "BEVDepth/bevdepth/exps/nuscenes/base_exp.py", line 249, in training_step
preds, depth_preds = self(sweep_imgs, mats)
File "BEVDepth/bevdepth/exps/base_cli.py", line 78, in run_cli
trainer.fit(model)
File "BEVDepth/bevdepth/exps/nuscenes/mv/bev_depth_lss_r50_256x704_128x128_20e_cbgs_2key_da_ema.py", line 29, in <module>
run_cli(BEVDepthLightningModel,
RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same
So I try to add the code:
@autocast(False)
def forward(self, x):
x = x.to(torch.float32) # Add
x = self.reduce_conv(x)
x = self.conv(x) + x
x = self.out_conv(x)
x = x.to(torch.float16) # Add
return x
The above error disappeared, but the memory usage is very high. When I set batch_size=1
, 22G memory is still used.
is this correct?
Met similar issue, I worked around it by using https://discuss.pytorch.org/t/runtimeerror-input-type-torch-cuda-floattensor-and-weight-type-torch-halftensor-should-be-the-same/104312/5
i.e. adding autocast like
from torch.cuda.amp import autocast
with autocast():
outputs = model.forward(tensor)
Met similar issue, I worked around it by using https://discuss.pytorch.org/t/runtimeerror-input-type-torch-cuda-floattensor-and-weight-type-torch-halftensor-should-be-the-same/104312/5
i.e. adding autocast like
from torch.cuda.amp import autocast with autocast(): outputs = model.forward(tensor)
may I ask what is your GPU memory usage? Is it still 22G when setting batch_size=1?