nixified-ai/flake

nixosModule: models.yaml not found exception

MayNiklas opened this issue · 17 comments

I've tried to install invokeai-nvidia on my server (I'm using flakes):

{ nixified, ... }: {

  imports = [ nixified.nixosModules.invokeai-nvidia ];

  services.invokeai = {
    enable = true;
  };

}

The service always fails on startup:

Nov 27 22:44:05 rtx3060 systemd[1]: Started invokeai.service.
Nov 27 22:44:08 rtx3060 invokeai-web[596092]: The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Nov 27 22:44:08 rtx3060 invokeai-web[596092]: [38B blob data]
Nov 27 22:44:09 rtx3060 invokeai-web[596092]: 2023-11-27 22:44:09.131483361 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
Nov 27 22:44:11 rtx3060 invokeai-web[596092]: [2023-11-27 22:44:11,708]::[InvokeAI]::INFO --> Patchmatch initialized
Nov 27 22:44:11 rtx3060 invokeai-web[596092]: /nix/store/i50149q86mr8adaxplbss5gxj5z3nmkv-python3.11-torchvision-0.15.2/lib/python3.11/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
Nov 27 22:44:11 rtx3060 invokeai-web[596092]:   warnings.warn(
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: An exception has occurred: /var/lib/invokeai/configs/models.yaml not found
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: == STARTUP ABORTED ==
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** One or more necessary files is missing from your InvokeAI root directory **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** Please rerun the configuration script to fix this problem. **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** From the launcher, selection option [7]. **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: ** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: Press any key to continue...Traceback (most recent call last):
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/backend/install/check_root.py", line 11, in check_invokeai_root
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     assert config.model_conf_path.exists(), f"{config.model_conf_path} not found"
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: AssertionError: /var/lib/invokeai/configs/models.yaml not found
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: During handling of the above exception, another exception occurred:
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: Traceback (most recent call last):
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/bin/.invokeai-web-wrapped", line 9, in <module>
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     sys.exit(invoke_api())
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:              ^^^^^^^^^^^^
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/app/api_app.py", line 216, in invoke_api
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     check_invokeai_root(app_config)  # note, may exit with an exception if root not set up
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:   File "/nix/store/84hrzxqn8b6pijncjsvpjv1ydrngjqsf-python3.11-InvokeAI-3.3.0post3/lib/python3.11/site-packages/invokeai/backend/install/check_root.py", line 40, in check_invokeai_root
Nov 27 22:44:12 rtx3060 invokeai-web[596092]:     input("Press any key to continue...")
Nov 27 22:44:12 rtx3060 invokeai-web[596092]: EOFError: EOF when reading a line
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Main process exited, code=exited, status=1/FAILURE
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Failed with result 'exit-code'.
Nov 27 22:44:13 rtx3060 systemd[1]: invokeai.service: Consumed 7.511s CPU time, received 9.1K IP traffic, sent 1.2K IP traffic.

I've tried deleting /var/lib/invokeai/- changed nothing.

Is the NixOS module currently meant to be working?
Are there any known workarounds for the issue I'm facing?

We need VM Tests. But I do not currently have the time to work on that. Try the release of nixified-ai instead of the latest master.

Since release doesn't currently build, this is not an valid option for me currently:

error: builder for '/nix/store/5bl9dphjmbb8y8l7mbgv5xqk64ab07l7-python3.11-mediapipe-0.10.7.drv' failed with exit code 1;
       last 25 log lines:
       > auto-patchelf: 0 dependencies could not be satisfied
       > running install tests
       > no Makefile or custom installCheckPhase, doing nothing
       > pythonCatchConflictsPhase
       > pythonRemoveBinBytecodePhase
       > pythonImportsCheckPhase
       > Executing pythonImportsCheckPhase
       > Check whether the following modules can be imported: mediapipe
       > Traceback (most recent call last):
       >   File "<string>", line 1, in <module>
       >   File "<string>", line 1, in <lambda>
       >   File "/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/importlib/__init__.py", line 126, in import_module
       >     return _bootstrap._gcd_import(name[level:], package, level)
       >            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       >   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
       >   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
       >   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
       >   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
       >   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
       >   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
       >   File "/nix/store/sl3b4arhx54vdihk6kfimfshyrp6lr6h-python3.11-mediapipe-0.10.7/lib/python3.11/site-packages/mediapipe/__init__.py", line 15, in <module>
       >     from mediapipe.python import *
       >   File "/nix/store/sl3b4arhx54vdihk6kfimfshyrp6lr6h-python3.11-mediapipe-0.10.7/lib/python3.11/site-packages/mediapipe/python/__init__.py", line 17, in <module>
       >     from mediapipe.python._framework_bindings import resource_util
       > ModuleNotFoundError: No module named 'mediapipe.python._framework_bindings'
       For full logs, run 'nix log /nix/store/5bl9dphjmbb8y8l7mbgv5xqk64ab07l7-python3.11-mediapipe-0.10.7.drv'.
error: 1 dependencies of derivation '/nix/store/45gch36i9xv1x96d823pdfn3z9cwi2fb-python3.11-InvokeAI-3.3.0post3.drv' failed to build
error: 1 dependencies of derivation '/nix/store/47x96la67i7xb9050zmjvgcc5kycc9iz-home-manager-path.drv' failed to build
error: 1 dependencies of derivation '/nix/store/if5ws72x2666l95adz5kl7i50ca7a203-unit-invokeai.service.drv' failed to build

The original issue I'm facing:
invokeai now has a configuration script, that is getting called when config files are missing. This won't work through systemd, since it requires user interaction. The package itself works as intended!

@MayNiklas As noted in the release notes:

Please beware that you may need to nuke your state directory due to the unstable and fragile on disk format of the upstream projects.

This is because the upstream project does not have an automated or stable way of moving from one version or another, this happens to people using InvokeAI normally too. So just remove your state in /var/lib/invokeai.

@MayNiklas As noted in the release notes:

Please beware that you may need to nuke your state directory due to the unstable and fragile on disk format of the upstream projects.

This is because the upstream project does not have an automated or stable way of moving from one version or another, this happens to people using InvokeAI normally too. So just remove your state in /var/lib/invokeai.

I did this. This system never had invokeai installed before

@MayNiklas

Since release doesn't currently build

What do you mean the release doesn't build?

Just confirmed, the release does build. Use github:nixified-ai/flake/2 instead of the latest master and let me know what happens.

@MayNiklas

Since release doesn't currently build

What do you mean the release doesn't build?

Puh, I can confirm: nix build 'github:nixified-ai/flake/2'#packages.x86_64-linux.invokeai-nvidiabuilds.
But installing nixified.packages.x86_64-linux.invokeai-nvidia through my system configuration fails: it seems like it tries to build with Python 3.11 instead of 3.10 or something like that? I will investigate it once I find some time.

For some reason, it builds a different package when being installed through my flake repo.
The difference is Python 3.10 vs. 3.11.
I'm not overriding any flake inputs.

I'm very confused and will investigate it once I find some time.

Just confirmed, the release does build. Use github:nixified-ai/flake/2 instead of the latest master and let me know what happens.

When trying to use that, I still get an error:

❯ nix run github:nixified-ai/flake/2#invokeai-nvidia -- --web
Unknown args: ['--web']
Unknown args: ['--web']
2023-11-30 17:24:01.347998031 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
/nix/store/ds0qkkilzh7mqawssx7z8dmpgk34v7wm-python3.10-torchvision-0.15.2/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
[2023-11-30 17:24:03,418]::[InvokeAI]::INFO --> Patchmatch initialized
Unknown args: ['--web']
Unknown args: ['--web']

An exception has occurred: /home/my-user/invokeai/models/core/convert/CLIP-ViT-bigG-14-laion2B-39B-b160k is missing
== STARTUP ABORTED ==
** One or more necessary files is missing from your InvokeAI root directory **
** Please rerun the configuration script to fix this problem. **
** From the launcher, selection option [7]. **
** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Press any key to continue...

Just confirmed, the release does build. Use github:nixified-ai/flake/2 instead of the latest master and let me know what happens.

When trying to use that, I still get an error:

❯ nix run github:nixified-ai/flake/2#invokeai-nvidia -- --web
Unknown args: ['--web']
Unknown args: ['--web']
2023-11-30 17:24:01.347998031 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
/nix/store/ds0qkkilzh7mqawssx7z8dmpgk34v7wm-python3.10-torchvision-0.15.2/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
[2023-11-30 17:24:03,418]::[InvokeAI]::INFO --> Patchmatch initialized
Unknown args: ['--web']
Unknown args: ['--web']

An exception has occurred: /home/my-user/invokeai/models/core/convert/CLIP-ViT-bigG-14-laion2B-39B-b160k is missing
== STARTUP ABORTED ==
** One or more necessary files is missing from your InvokeAI root directory **
** Please rerun the configuration script to fix this problem. **
** From the launcher, selection option [7]. **
** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Press any key to continue...

Try using

invokeai-model-install --yes --default-only --config_file <invoke-folder>}/config_custom.yaml

before starting the app itself.

A few things to note:

  1. You specified --web:
Unknown args: ['--web']

Doing this causes upstream program (InvokeAI) to break in mysterious ways due to bad argument handling logic. Stop passing it.

  1. The error you're getting only really happens if invokeai-configure fails, which it does because the upstream program (InvokeAI) is very buggy. So rm -rf ~/invokeai and try again on version 2. Each and every single time you change the version of the program, it doesn't matter how large or small, you must rm -rf ~/invokeai.

Oh okay! Thanks. I just watched your video where you passed in that config, so I thought I had to do that

@eikooc yeah unfortunately InvokeAI is buggy, backwards incompatible, complex and unstable, just get rid of the state when you can and you'll be alright. If we fetched the models with Nix we could obsolete 99.9% of what invokeai-configure does, but at that point we should just use the diffusers library directly and maybe make a better program and keep the good part of invokeai which is the frontend.

I got the v2 branch to build!
Calling nix flake lock --update-input invokeai fixed it.
Before rebuilding, I executed sudo rm -rf /var/lib/invokeai.
Then I got:

Dez 05 14:49:09 rtx3060 systemd[1]: Started invokeai.service.
Dez 05 14:49:12 rtx3060 invokeai-web[134765]: The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
Dez 05 14:49:12 rtx3060 invokeai-web[134765]: [38B blob data]
Dez 05 14:49:12 rtx3060 invokeai-web[134765]: 2023-12-05 14:49:12.987447504 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:1827 CreateInferencePybindStateModule] Init provider bridge failed.
Dez 05 14:49:15 rtx3060 invokeai-web[134765]: [2023-12-05 14:49:15,288]::[InvokeAI]::INFO --> Patchmatch initialized
Dez 05 14:49:15 rtx3060 invokeai-web[134765]: /nix/store/ds0qkkilzh7mqawssx7z8dmpgk34v7wm-python3.10-torchvision-0.15.2/lib/python3.10/site-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
Dez 05 14:49:15 rtx3060 invokeai-web[134765]:   warnings.warn(
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: An exception has occurred: /var/lib/invokeai/configs/models.yaml not found
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: == STARTUP ABORTED ==
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: ** One or more necessary files is missing from your InvokeAI root directory **
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: ** Please rerun the configuration script to fix this problem. **
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: ** From the launcher, selection option [7]. **
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: ** From the command line, activate the virtual environment and run "invokeai-configure --yes --skip-sd-weights" **
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: ** (To skip this check completely, add "--ignore_missing_core_models" to your CLI args. Not installing these core models will prevent the loading of some or all .safetensors and .ckpt files. However, you can always come back and install these core models in the future.)
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: Press any key to continue...Traceback (most recent call last):
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:   File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/lib/python3.10/site-packages/invokeai/backend/install/check_root.py", line 11, in check_invokeai_root
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:     assert config.model_conf_path.exists(), f"{config.model_conf_path} not found"
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: AssertionError: /var/lib/invokeai/configs/models.yaml not found
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: During handling of the above exception, another exception occurred:
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: Traceback (most recent call last):
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:   File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/bin/.invokeai-web-wrapped", line 9, in <module>
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:     sys.exit(invoke_api())
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:   File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/lib/python3.10/site-packages/invokeai/app/api_app.py", line 216, in invoke_api
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:     check_invokeai_root(app_config)  # note, may exit with an exception if root not set up
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:   File "/nix/store/bkgmn0gqi97h94fqs588dvxw3l9yk5gs-python3.10-InvokeAI-3.3.0post3/lib/python3.10/site-packages/invokeai/backend/install/check_root.py", line 40, in check_invokeai_root
Dez 05 14:49:16 rtx3060 invokeai-web[134765]:     input("Press any key to continue...")
Dez 05 14:49:16 rtx3060 invokeai-web[134765]: EOFError: EOF when reading a line
Dez 05 14:49:17 rtx3060 systemd[1]: invokeai.service: Main process exited, code=exited, status=1/FAILURE
Dez 05 14:49:17 rtx3060 systemd[1]: invokeai.service: Failed with result 'exit-code'.
Dez 05 14:49:17 rtx3060 systemd[1]: invokeai.service: Consumed 7.398s CPU time, received 9.1K IP traffic, sent 1.2K IP traffic.

-> the initial configuration of the NixOS module does not seem to work.

If wanted, I can try to fix the issue by creating a initial startup script within the module.

@MayNiklas If you can PR a fix then go ahead, otherwise use github:nixified-ai/invokeai/1 to go back to version 1. It's hard to maintain this buggy program, and we need VM tests. But I have not had the time to work on nixified-ai much lately.

@MayNiklas If you can PR a fix then go ahead, otherwise use github:nixified-ai/invokeai/1 to go back to version 1. It's hard to maintain this buggy program, and we need VM tests. But I have not had the time to work on nixified-ai much lately.

In theory, even having a test that checks if the service starts successfully would be very helpful.
BUT: accessing CUDA within a test might be problematic - I might play arround with it a bit.

CUDA is another problem, we can define a VM Test that checks the program works with CPU. And I do have an old machine and nvidia GPU I can set up in CI. Doing this in CI, and passing through the GPU will not be a problem.