can't train cause of GPU memory issue

Question

can't train cause of GPU memory issue

Opened this issue 2 years ago · 6 comments

Hi.
I tried dreamartist, but I get a GPU out of memory error.
The error code is below.
Can you please advise me?

Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]

Training at rate of 0.005 until step 3000
Preparing dataset...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.99s/it]
0%| | 0/3000 [00:03<?, ?it/s]
Applying cross attention optimization (Doggettx).
Error completing request
Arguments: ('test1', '0.005', 1, 'D:\Program Files\stable-diffusion-webui-dream\extensions\DreamArtist\imgs\train', 'textual_inversion', 512, 512, 3000, 500, 500, 'D:\Program Files\stable-diffusion-webui-dream\textual_inversion_templates\style_filewords.txt', True, False, '', '', 20, 0, 7, -1.0, 512, 512, 5.0, '', True, False, 1, 1) {}
Traceback (most recent call last):
File "D:\Program Files\stable-diffusion-webui-dream\modules\ui.py", line 185, in f
res = list(func(*args, **kwargs))
File "D:\Program Files\stable-diffusion-webui-dream\webui.py", line 54, in f
res = func(*args, **kwargs)
File "D:\Program Files\stable-diffusion-webui-dream\extensions\DreamArtist\scripts\dream_artist\ui.py", line 30, in train_embedding
embedding, filename = dream_artist.cptuning.train_embedding(*args)
File "D:\Program Files\stable-diffusion-webui-dream\extensions\DreamArtist\scripts\dream_artist\cptuning.py", line 430, in train_embedding
loss.backward()
File "D:\Program Files\stable-diffusion-webui-dream\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\Program Files\stable-diffusion-webui-dream\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "D:\Program Files\stable-diffusion-webui-dream\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "D:\Program Files\stable-diffusion-webui-dream\repositories\stable-diffusion\ldm\modules\diffusionmodules\util.py", line 139, in backward
input_grads = torch.autograd.grad(
File "D:\Program Files\stable-diffusion-webui-dream\venv\lib\site-packages\torch\autograd_init.py", line 276, in grad
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 5.31 GiB already allocated; 0 bytes free; 6.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Answer 1 · 2022-11-14T20:07:16.000Z

as the log says, you need 1024mb more of free vram to run it, . if you can free that vram (if it is used by another software) you can run it, otherwise you can't and you need a 10 gb + vram card, at least with that settings.

Answer 2 · 2022-11-15T11:05:42.000Z

It's possible to get it running on an 8GB video card by using a very small input image- under 300x300 pixels when not using reconstruction, and under 200x200 when that's on.

Unfortunately, after several tests, I haven't gotten any good results from doing that- even after 10+ hours of training, the embeddings seem to produce mostly just random shapes and textures. I'm not sure if that's an issue with the small resolution of the input images, the low VRAM, the content of the input images, or if I've just set something up incorrectly, however.

Answer 3 · 2022-11-15T11:16:08.000Z

Thanks for the advice!
I have a GTX1070 8GB, so it seemed to lack GPU memory.
I was able to start the train with a resolution of 384*384.
I trained about 3000steps but did not get good results....

Answer 4 · 2022-11-17T07:39:51.000Z

It's possible to get it running on an 8GB video card by using a very small input image- under 300x300 pixels when not using reconstruction, and under 200x200 when that's on.

Unfortunately, after several tests, I haven't gotten any good results from doing that- even after 10+ hours of training, the embeddings seem to produce mostly just random shapes and textures. I'm not sure if that's an issue with the small resolution of the input images, the low VRAM, the content of the input images, or if I've just set something up incorrectly, however.

The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.

Answer 5 · 2022-11-17T07:40:20.000Z

Thanks for the advice! I have a GTX1070 8GB, so it seemed to lack GPU memory. I was able to start the train with a resolution of 384*384. I trained about 3000steps but did not get good results....

The original instructions are quite rough and you may not use DreamArtist correctly. You can use it following the new instructions.

Answer 6 · 2023-01-01T12:31:58.000Z

Got the same result running on rtx 3060ti with xformers following the new instructions.
Tried to use different models (with less and less vram usage, going from 6 to 4 to 2 gig models). The result is pretty much the same: dreamartist fills the unused vram (up to 7.2-7.3 gigs of total usage), then gives an error mentioned in the post. Also, the vram refuses to unload itself unless you close webui.

launch args:
set COMMANDLINE_ARGS=--opt-split-attention --xformers --autolaunch
set ACCELERATE=
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128

the error itself:
20, 0, 7, -1.0, 512, 512, '5.0', '', True, False, 1, 1, 1.0, 25.0, 1.0, 25.0, 0.9, 0.999, False, 1, False, '0.000005') {}
Traceback (most recent call last):
File "D:\stable-diffusion-webui\modules\call_queue.py", line 45, in f
res = list(func(*args, **kwargs))
File "D:\stable-diffusion-webui\modules\call_queue.py", line 28, in f
res = func(*args, **kwargs)
File "D:\stable-diffusion-webui\extensions\DreamArtist-sd-webui-extension\scripts\dream_artist\ui.py", line 30, in train_embedding
embedding, filename = dream_artist.cptuning.train_embedding(*args)
File "D:\stable-diffusion-webui\extensions\DreamArtist-sd-webui-extension\scripts\dream_artist\cptuning.py", line 542, in train_embedding
loss.backward()
File "D:\stable-diffusion-webui\venv\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init_.py", line 173, in backward
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 253, in apply
return user_fn(self, *args)
File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 146, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\autograd_init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 6.32 GiB already allocated; 0 bytes free; 6.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF