CUDA out of memory，even set batch_size to 1

Question

CUDA out of memory，even set batch_size to 1

Opened this issue 5 years ago · 8 comments

RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 8.00 GiB total capacity; 5.69 GiB already allocated; 4.04 MiB free; 5.88 GiB reserved in total by PyTorch)

i have set batch_size to 1, but it still occur oom

Answer 1 · 2020-04-20T02:50:12.000Z

What sort of GPU do you have? It seems the most likely explanation is that you have another process that is using the GPU memory. If you set "fp16_run": true in https://github.com/NVIDIA/waveglow/blob/master/config.json this also reduces memory use.

Answer 2 · 2020-05-05T03:44:39.000Z

I also have this question, my GPU is nvidia P100, and I'm sure that there is no other process that is using the GPU. I think it is because the model is too big, so I use apex.amp to try to train the model and it works.

Answer 3 · 2020-06-05T01:53:44.000Z

I can run with batch_size=1 on 1050ti 4GB, you should check your system for running processes

Answer 4 · 2020-06-25T21:08:36.000Z

You can decrease the segment length if running out of memory. You should be able to fit batch size 1 on a 8GB GPU.
Closing due to inactivity.

Answer 5 · 2020-06-26T02:40:12.000Z

You can decrease the segment length if running out of memory. You should be able to fit batch size 1 on a 8GB GPU.
Closing due to inactivity.

in case I have plenty of VRAM left, should I increase segment length?
how does segment length affect accuracy, inference speed, training speed?

Answer 6 · 2020-07-01T05:02:37.000Z

We haven't explored the effect of changing segment length on output quality.
Shorter segment lengths will result in more padding due to the large receptive field. This is not helpful for optimization.

Answer 7 · 2021-04-21T23:37:45.000Z

Was this ever resolved. I'm getting this on a RTX 2070 Super 8GB. I've tried 1 BS and 1,000 segment size and still get the same error. I'm also seeing the following warning...Could this be related. I found I needed to upgrade the NGC docker version of PyTorch to be compatible with my Cuda version. Should we be using a more recent NGC?

Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError('/opt/conda/lib/python3.6/site-packages/amp_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIdEEPKNS_6detail12TypeMetaDataEv',)

Answer 8 · 2021-06-26T12:58:47.000Z

In Google Colab, with a batch size of 1, it gives out of memory error for an audio 5 seconds long.

waveglow = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_waveglow', model_math='fp32')

waveglow = waveglow.remove_weightnorm(waveglow)
waveglow = waveglow.to('cuda')
waveglow.eval()

audio = waveglow.infer(mel.cuda())

Before the model => GPU Memory = 1100MiB / 15109MiB
After defining the model => GPU Memory = 15046MiB / 15109MiB