qurator-spk/eynollah

What is the known working GPU config?

Closed this issue · 5 comments

I am using an Amazon pressed Ubuntu 16 Deep Learning AMI which contains CUDA 10, 10.1, 10.2, and 11.

I am using Mambaforge with Python 3.6 or 3.7

Tensorflow 2 is automatically used. I plan to try Tensorflow 1.x next.

The process is loaded into GPU memory, but the GPU is never used.

Is there a known working full stack config for eynollah on the GPU (OS+version, CUDA+version, Python+version, Tensorflow+version, etc) that you don't mind sharing?

Thanks,

cneud commented

Hi @mach881040, for me it works well with NVIDIA 2070S GPU on Ubuntu 18.04, Python 3.7, Tensorflow 2.4.1 and CUDA 10.1. Note that there is also still a lot of room for improvement wrt GPU utilization - we hope to optimize this, but for our use case quality of results is much more important than throughput speed.

The process is loaded into GPU memory, but the GPU is never used.

I can confirm this with Ubuntu 22.04, Python 3.8, TF 2.10. It's not about low utilisation. The OP says no utilisation, and that's what I see, too. The memory consumption is only 107 MB (and not increasing), GPU util is never anything other than 0%.

Sorry, error on my part. Cause was an insufficient CUDA/TF installation. I probably ran into #72 as well.

(I am on CUDA 11.7 though, and now it does work. So the note in the Readme might not be correct.)

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

cneud commented

BTW, is there a particular reason for keeping the TF1-style session management? I found that if I remove it completely (including the explicit GC calls), and avoid repeating load_model calls by storing the model refs in Eynollah's instance, it gets about 9% faster on average (while max RSS of course does increase from 4 GB to 7 GB).

This should already be fixed with 7345f6b (which has since been merged), right?

The working config for (limited) GPU use is now documented in the README.