MCG-NJU/SparseBEV

OverflowError: cannot convert float infinity to integer

zhaoyangwei123 opened this issue · 3 comments

您好,我在8*RTX4090上面跑R101的模型出现cuda out of memory,然后我将batch size改为4,学习率调成1e-4,但是出现如下问题,请问batch size必须是8吗?
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py:97: RuntimeWarning: divide by zero encountered in double_scalars
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
Traceback (most recent call last):
File "train.py", line 180, in
main()
File "train.py", line 99, in main
train_loader = build_dataloader(
File "/home/wzy/SparseBEV/loaders/builder.py", line 23, in build_dataloader
sampler = DistributedGroupSampler(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/mmdet/datasets/samplers/group_sampler.py", line 97, in init
math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /
OverflowError: cannot convert float infinity to integer
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 510959) of binary: /home/wzy/anaconda3/envs/sparsebev/bin/python3.8
Traceback (most recent call last):
File "/home/wzy/anaconda3/envs/sparsebev/bin/torchrun", line 8, in
sys.exit(main())
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/wzy/anaconda3/envs/sparsebev/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

最少是8吧,你一张卡的bs不可能低于1的,你这个情况只能调 stop_prev_grad 来跑

您好,我用这个代码:python viz_bbox_predictions.py --config configs/r50_nuimg_704x256.py --weights checkpoints/r50_nuimg_704x256.pth
viz_bbox_predictions.py似乎只能画出预测的框,想问一下有没有接口可以同时把gt画出来呢?

这个你自己改改不就行了嘛