What is the known working GPU config?

Question

What is the known working GPU config?

Closed this issue 2 years ago · 5 comments

I am using an Amazon pressed Ubuntu 16 Deep Learning AMI which contains CUDA 10, 10.1, 10.2, and 11.

I am using Mambaforge with Python 3.6 or 3.7

Tensorflow 2 is automatically used. I plan to try Tensorflow 1.x next.

The process is loaded into GPU memory, but the GPU is never used.

Is there a known working full stack config for eynollah on the GPU (OS+version, CUDA+version, Python+version, Tensorflow+version, etc) that you don't mind sharing?

Thanks,

Answer 1 · 2022-08-23T19:12:59.000Z

Hi @mach881040, for me it works well with NVIDIA 2070S GPU on Ubuntu 18.04, Python 3.7, Tensorflow 2.4.1 and CUDA 10.1. Note that there is also still a lot of room for improvement wrt GPU utilization - we hope to optimize this, but for our use case quality of results is much more important than throughput speed.

Answer 2 · 2023-02-11T12:55:38.000Z

The process is loaded into GPU memory, but the GPU is never used.

I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about low utilisation. The OP says no utilisation, and that's what I see, too. The memory consumption is only 107 MB (and not increasing), GPU util is never anything other than 0%.

Answer 3 · 2023-02-11T16:52:26.000Z

Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well.

(I am on CUDA 11.7 though, and now it does work. So the note in the Readme might not be correct.)

Answer 4 · 2023-02-11T17:09:18.000Z

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

Answer 5 · 2023-05-13T11:08:13.000Z

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

This should already be fixed with 7345f6b (which has since been merged), right?

The working config for (limited) GPU use is now documented in the README.