OCR-D/ocrd_all

cuda: limit gpu memory

jbarth-ubhd opened this issue · 6 comments

From tensorflow docs: »...to configure a virtual GPU device with tf.config.set_logical_device_configuration and set a hard limit on the total memory to allocate on the GPU.«

Found in eynollah.py: #gpu_options = tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=7.7, allow_growth=True)

Could the gpu be utilized better with gpu memory limit?

grafik

Y = memory usage in MB. Logarithmic scale. min = 10 MB. Sampled in 3 s intervals with nvidia-smi.

red = sbb-binarize, blue = eynollah-segment, green = calamari-recognize (no gpu)

Note: per_process_gpu_memory_fraction is a flp ratio, so 7.7 means "oversubscribe memory of a single GPU by using slow unified memory". Here's the full documentation:

Fraction of the available GPU memory to allocate for each process.
1 means to allocate all of the GPU memory, 0.5 means the process
allocates up to ~50% of the available GPU memory.
GPU memory is pre-allocated unless the allow_growth option is enabled.
If greater than 1.0, uses CUDA unified memory to potentially oversubscribe
the amount of memory available on the GPU device by using host memory as a
swap space. Accessing memory not available on the device will be
significantly slower as that would require memory transfer between the host
and the device. Options to reduce the memory requirement should be
considered before enabling this option as this may come with a negative
performance impact. Oversubscription using the unified memory requires
Pascal class or newer GPUs and it is currently only supported on the Linux
operating system. See
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
for the detailed requirements.

If you want to share GPUs, set it between 0 and 1. (But of course, you don't know how much memory is available on the target, so fixed ratios are problematic. I recommend making this configurable at runtime, perhaps via envvars.)

Also: limiting CPU-side RSS would be useful. We could do this from the outside (on the docker run cmdline or in docker-compose.yml) – e.g. --memory 2G or --ulimit rss=2000000:4000000 (resp. deploy.resources.limits.memory: 2). Or via ulimit inside the image (profile.d etc).

(This came up in OCR-D/quiver-benchmarks#22.)

Unfortunately, there is no way to set the amount of shared GPU memory from the outside. So all our processors must be programmed in a cooperative way.

Or via ulimit inside the image (profile.d etc).

/etc/profile.d might not work as expected; I believe it's only sourced for login shells, so I'd suggest to work with the Docker options if possible.

(I wouldn't be surprised if I'm wrong about /etc/profile.d, I just tend to test this stuff thoroughly because it's easy to mess up.)

/etc/profile.d might not work as expected; I believe it's only sourced for login shells,

Indeed. For non-interactive bash, one can pass BASH_ENV to be sourced, but when invoked as sh even that won't happen. Besides, our Docker containers generally will not be run in a shell. But perhaps there's another way?

There may also be a middle ground in setting limits for all containers in the Docker daemon config.

Besides, our Docker containers generally will not be run in a shell. But perhaps there's another way?
There may also be a middle ground in setting limits for all containers in the Docker daemon config.

I'm not up-to-date: How are the containers run currently? Is it recommended that users do the docker run themselves or is there some kind of tooling we could change?

ocrd_all's Dockerfile looks like there's nothing we could use.

https://docs.python.org/3.9/library/resource.html looks promising, maybe we could use it in core, but I'm not sure if I understood it properly.

I'm not up-to-date: How are the containers run currently? Is it recommended that users do the docker run themselves or is there some kind of tooling we could change?

Yes, it's still basically whatever the user comes up with. (We wrap them in Docker Compose services of another docker stage, but other usres still run CLIs, some of them after converting to Singularity.)

But that's rapidly changing with the service containers which @joschrew is implementing. There we could simply add specs to the (generated) Docker Compose file, maybe after reading some .env variables.

https://docs.python.org/3.9/library/resource.html looks promising, maybe we could use it in core, but I'm not sure if I understood it properly.

Indeed, that's yet another route we could go!