iot-salzburg/gpu-jupyter

GPU Mem usage

choff5507 opened this issue · 2 comments

Im curious if anyone has noticed this behavior.

For some reason when I have a notebook open for some time the memory usage on my GPU will max out. Stopping and restarting the docker container will clear it back to the normal level. It's happened previously that it would max out and could not allocate more memory for training.

I am working on a very small dataset so there's no way it's related to the work at hand but it appears there is something happening in the background.

Anyone have any suggestions or ideas?

Same problem, I've no solution but additional details. We are experiencing a quick GPU out-of-memory, using a huggingface pipeline inside the docker gpu-jupyter. We tested it with different GPUs (8GB or 24GB) and the memory is fully taken, followed by out-of-memory, while running the pipeline directly on Ubuntu works well (only 4GB used).
It does not depend on Jupyter (on our side) and seems related to nvidia-docker: it happens also with other docker images without Jupyter (eg. from "official" huggingface docker images with GPU support).

The whole pipeline running outside docker (our former seutp), it works perfectly with about 4GB GPU Mem. The whole pipeline also did work in a nvidia-docker2 installation 4 months agos.

Sorry for the late response.
This could be an error that was fixed in CUDA's more recent images. Please reopen if it occurs also in images based on nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04 or later.