rbbrdckybk/dream-factory

Issue running when i add multiple GPU's

SHM132 opened this issue · 7 comments

First off, thank you so much.

I am trying to set up my 6x3080 rig. got everything installed and running fine with just 1 GPU connected, tested. DF ran good.

running windows 10.

When i added in the second GPU i get this error when i try to launch DF:

(base) C:\Users\ryan\dream-factory>python dream-factory.py
Traceback (most recent call last):
File "C:\Users\ryan\dream-factory\dream-factory.py", line 30, in
from torch.cuda import get_device_name, device_count
ModuleNotFoundError: No module named 'torch'

Here is my config file info:

config.txt

It looks like PyTorch didn't get installed correctly. I'm assuming you did a standard install and ran setup.py; you can try running it again in verbose mode with:

python setup.py --verbose --force

to see if you can see any errors during the PyTorch install. Or, you can just try to install PyTorch manually.

Odd that it would work at all without PyTorch though, even with just 1 GPU - I'm pretty sure I use get_device_name (which requires PyTorch) to get the GPU info for each device regardless of how many GPUs you try to initialize.

Let me know if re-installing PyTorch works!

At work so can't test.

I didn't explicitly install pytorch so unless that comes native with python install then you might be right.

I first ran it with 1 gpu. Ran good. Added new ckpts and embeddelings, ran fine. Connected my 2nd gpu and then changed the configuration file to initiate the 2nd gpu. I think it has to do with that because after it failed I then disconnected and tried to run with the same 1st gpu but also got the same error.

So I may have the wrong syntaxes or something when changing the config.txt file.

To install pytorch is it just a pip install ordeal?

Pytorch was the issue. Also changed the configuration file back to default eoth Auto for gpu.

Thanks for thr help.

Did some testing and the bottleneck was the cpu after 3 3080s. The 4th 3080 only added like 5% performance.

Still I will be putting the other 3 in my main rig and let it go to work.

I also noticed either because of dreamfactory or just 3080vs3060 but my 3060 is as fast to render as the 3080.

if you are running them with nvlink, then benefits of multiple drop quickly. with dreamfactory, you would be better off running each standalone. 6 instances of stable diffusion each running on 1 card would be faster than 2 instances running on 3 cards each. also though would be the caching of the model in the system ram (if you have that enabled) which means having less than 64 gb system ram is not going to work well with your setup.

Just out of curiosity, what CPU are you running? I've run 3x 3070ti GPUs on an AMD Sempron CPU (low-end CPU from ~2009 that only has a single core) in DF without quite maxing the CPU out. Make sure you run some sort of resource monitor to confirm that it's actually the CPU.

If you have 16GB of RAM or less (likely if you're re-purposing a mining rig since most mining didn't require system RAM), then my money would be on a RAM bottleneck. SD wants to keep a copy of each model in system RAM, and with 6 instances that can add up pretty fast, especially if you're loading large models (pruning your models to 2GB can help here). Auto1111 has a low system RAM flag that you can add to your Auto1111 startup script that might help too (DF will automatically use any Auto1111 startup flags you set).

Final note - the 3080 should definitely be significantly faster than the 3060. I mainly render on a 3080ti and a 3060 in DF, and the 3080ti is more than twice as fast as the 3060. I've also tested on 3070ti and 3060ti, and those GPUs are both noticeably faster than the 3060 as well. If you're seeing similar performance on a 3080 vs a 3060 with the same settings then you've definitely got some sort of bottleneck outside of the GPU.

Good luck!

I was running with hardware monitor open, it definitly was spiking my CPU which is a i3 7100. But it also could be ram, i had to scavenge parts from my gaming rig to put in 16gb. its only 2133 ddr4 so its quite slow too. like i mentioned its a converted mining rig so the Mobo also only has 2 DIMMs. so maybe its a RAM bottleneck like you said.

3GPU's = 85% CPU load
4GPU's = %100+ CPU load

i will have to pull a 3080 and test it on my main rig to compare to my 3060. i expected it to be way faster like your example. so i wonder why its getting such terrible performance.

Update on performance. swapped out my 3060 12gb with a 3080 12gb in my main PC. This pc is running a ryzen 7 5800x, 16 gb DDR4 3200.

averages of 3 renders not including initial render after settings change. the 1st render after new settings is about 50% slower, i assume loading Vram or something.

3060 12 gb:
512 single = 3.15s
512 x8 batch = 19.13s
1024 single = 17.83s
1024 x batch = 35.46s

3080 12 gb:
512 single = 1.93s
512 x8 batch = 9.3s
1024 single = 7.2s
1024 x batch = 14.23s

3080 VS 3060 performance gain:

512 single = +63.2%
512 x8 batch = +105%
1024 single = +147%
1024 x batch = +149%