kpthedev/stable-karlo

Model downloads not persisting after crash

jasonmhead opened this issue · 3 comments

I have tried at least a couple times to run a image generation, and each time it seems to start the large file downloads over again and errors out with

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 23.93 GiB total capacity; 8.93 GiB already allocated; 14.04 GiB free; 9.23 GiB reserved in total by PyTorch)

I've run

set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'
set CUDA_VISIBLE_DEVICES=1

before launching streamlit.

suggestions?

Is the code automatically downloaded from some library behind the scenes?
Not seeing any download urls in the code.

Full error thread:

2023-01-03 01:16:50.104 Uncaught app exception
Traceback (most recent call last):
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\app.py", line 143, in <module>
    main()
  File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\app.py", line 120, in main
    images_up = upscale(
  File "C:\Users\Jason\Documents\machine_learning\image_ML\stable-karlo\models\generate.py", line 107, in upscale
    images = pipe(
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 499, in __call__
    image = self.decode_latents(latents.float())
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 266, in decode_latents
    image = self.vae.decode(latents).sample
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 605, in decode
    decoded = self._decode(z).sample
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 577, in _decode
    dec = self.decoder(z)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\vae.py", line 213, in forward
    sample = self.mid_block(sample)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 393, in forward
    hidden_states = attn(hidden_states)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Jason\.conda\envs\karlo\lib\site-packages\diffusers\models\attention.py", line 354, in forward
    torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 23.93 GiB total capacity; 8.93 GiB already allocated; 14.04 GiB free; 9.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

You're most likely running out of memory during the upscaling. I just updated the app with new optimizations for lowering the amount of memory used.

Try to pull the latest version and check out the updated Memory Requirements section for instructions on how to use the optimizations.

Seems that the downloads are rolled back/are deleted(?) if there is an error the first time running things.

Perhaps things could be adjusted so that the downloads stay around for future trys even if the first generation errors out.

Are the files downloaded stored persistently locally, or downloaded for each session?

That's odd, the downloads should persist even after a crash. The downloads are automatically done by diffusers and should be cached locally.

If you generate images without the upscaler, does it still re-download?