nixified-ai/flake

Can't Unload Models (max_loaded_models option isn't working)

Fieth-Ceboq opened this issue · 2 comments

Issue

While switching the models, the "VRAM in use" keeps increasing to a point where the application crashes with message "CUDA out of memory"

Potential Solution

The invoke.ai application seems to have a "--max_loaded_models 1" option, which isn't working when launched through nixified-ai:

nix flakes run github:nixified-ai/flake#invokeai-nvidia -- --free_gpu_mem 1 --max_loaded_models 1
Unknown args: ['--max_loaded_models', '1']

Steps to Reproduce

  1. Geneate an image, notice the "VRAM in use: x"
  2. Generate another image using the same model, notice "VRAM in use:" is still x
  3. Switch to another model, notice increase in "VRAM in use: y"
  4. Generate another image using the same model, notice "VRAM in use" is still y
  5. Switch to another model, notice increase in "VRAM in use: z"
  6. Switch to another model, application crashes with "CUDA out of memory"

Clarification/Request

How to ensure VRAM doesn't keep increasing when switching models?

The --max_loaded_models has been deprecated/removed by InvokeAI upstream in version 3. You can use the --ram and --vram flags to control how much RAM and VRAM to allocate to the model cache instead.

Thanks for the clarification.

I had allotted ~90% ram and vram during install time assuming they are for generation purpose.