Model downloads not persisting after crash
jasonmhead opened this issue · 3 comments
I have tried at least a couple times to run a image generation, and each time it seems to start the large file downloads over again and errors out with
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 23.93 GiB total capacity; 8.93 GiB already allocated; 14.04 GiB free; 9.23 GiB reserved in total by PyTorch)
I've run
set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'
set CUDA_VISIBLE_DEVICES=1
before launching streamlit.
suggestions?
Is the code automatically downloaded from some library behind the scenes?
Not seeing any download urls in the code.
Full error thread:
2023-01-03 01:16:50.104 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\app.py", line 143, in <module>
main()
File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\app.py", line 120, in main
images_up = upscale(
File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\models\generate.py", line 107, in upscale
images = pipe(
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 499, in __call__
image = self.decode_latents(latents.float())
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 266, in decode_latents
image = self.vae.decode(latents).sample
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 605, in decode
decoded = self._decode(z).sample
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 577, in _decode
dec = self.decoder(z)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 213, in forward
sample = self.mid_block(sample)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 393, in forward
hidden_states = attn(hidden_states)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\attention.py", line 354, in forward
torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 23.93 GiB total capacity; 8.93 GiB already allocated; 14.04 GiB free; 9.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
You're most likely running out of memory during the upscaling. I just updated the app with new optimizations for lowering the amount of memory used.
Try to pull the latest version and check out the updated Memory Requirements section for instructions on how to use the optimizations.
Seems that the downloads are rolled back/are deleted(?) if there is an error the first time running things.
Perhaps things could be adjusted so that the downloads stay around for future trys even if the first generation errors out.
Are the files downloaded stored persistently locally, or downloaded for each session?
That's odd, the downloads should persist even after a crash. The downloads are automatically done by diffusers and should be cached locally.
If you generate images without the upscaler, does it still re-download?