VITA-Group/FasterSeg

CUDA out of memory

J-JunChen opened this issue · 2 comments

Hi, author.

I use one RTX 2080 Ti for searching, but when the code do the validation step in training phrase, the 11G GPU memories can't meet the code, could you tell me how to deal with this? Thanks.

When I executed the command CUDA_VISIBLE_DEVICES=1 python train_search.py, the error happens:

14:37:50 params = 251.803792MB, FLOPs = 71.419396GB
architect initialized!
using downsampling: 2
Found 1487 images
using downsampling: 2
Found 1488 images
using downsampling: 2
Found 500 images
  0%|                                                    | 0/20 [00:00<?, ?it/s]25 14:37:53 True
25 14:37:53 search-pretrain-256x512_F12.L16_batch3-20210425-143740
25 14:37:53 lr: 0.02
25 14:37:53 update arch: False
[Step 495/495]: [26:39<00:00, 3.23s/it]                  | 0/20 [00:00<?, ?it/s]
[Epoch 1/20][validation...]:   0%|                       | 0/20 [26:39<?, ?it/s]25 15:04:32 Thread 0 handle 167 data.
25 15:04:32 Thread 1 handle 167 data.
25 15:04:32 Thread 2 handle 166 data.

...

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/cjj/miniconda3/envs/real-time/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/home/cjj/miniconda3/envs/real-time/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/home/cjj/miniconda3/envs/real-time/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 109, in rebuild_cuda_tensor
    storage = storage_cls._new_shared_cuda(
RuntimeError: CUDA error: out of memory

Hi @Jun-jieChen ,

Thank you for your interest in our work!

If you always get OOM during [validation...] process, you may consider adding an argument threds=2 or threds=1 in the definition of SegEvaluator: https://github.com/VITA-Group/FasterSeg/blob/master/search/train_search.py#L115