LarryJane491/Lora-Training-in-Comfy

Don´t know what is happening

xplpex opened this issue · 19 comments

C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe: can't open file 'C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in
main()
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe', 'custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=C:\ia\ComfyUI_windows_portable\ComfyUI\models\checkpoints\epicphotogasm_zUniversal.safetensors', '--train_data_dir=C:/ia/photyo/train', '--output_dir=models/loras', '--logging_dir=./logs', '--log_prefix=LumaLora', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=LumaLora', '--train_batch_size=1', '--save_every_n_epochs=50', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=16', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 2.
Train finished
Prompt executed in 2.72 seconds

something with python , but I don´t have any idea

From the second line: "No such file or directory". So it couldn't find the program.

Your folders should look like this:
custom_nodes/Lora-Training-in-Comfy/[lots of files and folders]

Can you confirm that's how it looks?

image_2024-01-16_153502138
I searched for the file and it was there same name and everthing

image_2024-01-16_153710577

Ah, I see the problem.
When you download from github, it creates a folder named Lora-Training-In-Comfy-main.

But the folder must be named Lora-Training-in-Comfy. Remove the -main and it will work ^^.

That's my bad. I need to find a way to have a less strict requirement on the folder name. For now though, the custom node must be named Lora-Training-in-Comfy!

for what I saw this is my folder path C:\ia\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts

and this is the desired one C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py

the difference seems to be that my has a /ComfyUI/ folder inside ComfyUI_windows_portable while the program tries to find one that doesn´t , any way to fix that without fucking my comfyui?

Wait, I'm confused now. These screenshots you posted, that's your setup, right? If that's so the custom node is clearly named Lora-Training-In-Comfy**-main**.

Or did you remove -main and still have this issue?

Don't worry about the ComfyUI folder, what matters is the path from the launcher to the custom node ^^.

I already removed "-main-" , but now getting a new kind of error related to a series of warnings , [Bug]: ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU

A cuda error means the torch module isn't using your GPU. You have to reinstall that module. For that, follow the instructions related to Pytorch site, written in the Troubleshoot section of the main page. It's easy, you just go to the website, fill the options for your situation, and copy-paste the code it gives you into a command prompt (but you have to do it in the right Python environment).

If it doesn't work, that's likely because you have used the one-click install of Comfy. Very useful, but makes Python dependency installation a bit more complicated. I recommend installing the base version of ComfyUI instead. Follow my guide here:

https://www.reddit.com/r/comfyui/comments/1995whb/guide_learn_to_deal_with_python_programs/

You don't have to delete your current ComfyUI folder, just create a new one following this guide.

I have the same issue with the LoRA trainer looking for different installation paths and I effectively had to copy it 3 times.

E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

errored out so I had to add the git install described above:

E:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy-main/sd-scripts/train_network.py

This then gave another error, looking for the install minus the 'ComfyUI' folder, so I copied the whole thing once more:

E:\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

I have this problem, I don't really know what to do, any suggestions?

C:\ComfyUI_windows_portable_nvidia_cu121_or_cpu\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy/sd-scripts/train_network.py

C:\Users\canas\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe: Error while finding module specification for 'accelerate.commands.launch' (ModuleNotFoundError: No module named 'accelerate')

Train finished

i have the same problem missing accelerate but its installed in the venv..

i am having the same problem though i noticed both in mine and the first error posted here it is looking for the file in ComfyUI_windows_portable\custom_nodes where as in their computer it is at ComfyUI_windows_portable\ComfyUI\custom_nodes

I noticed the exact same thing on mine so that could be some sort of problem, but idk shit, here is my error:
C:\Users\jacks\AppData\Local\Programs\Python\Python312\python.exe: can't open file 'C:\Users\jacks\Documents\Stable Diffusion\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py': [Errno 2] No such file or directory

so its looking for the custom nodes folder in comfyUI_Windows_portable when in reality it is in comfyUI_Windows_portable\ComfyUI

ok edit it seems larryjane awnsered this up above saying it only matter in relation to the launcher but it still seems weird to me idk

same problem all I can find is a accelerate.YAML but it keeps telling me it can not find it, the directory is named as Larry says, it was already nemed correctly (without the _main), yet it still says :
AppData\Local\Programs\Python\Python310\python.exe: Error while finding module specification for 'accelerate.commands.launch' (ModuleNotFoundError: No module named 'accelerate')
Train finished
Prompt executed in 1.10 seconds

It does say it finished the train, YET, I can not find my lora in the Models

the yaml says this when opened in editor:
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: 'NO'
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

for what I saw this is my folder path C:\ia\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts

and this is the desired one C:\ia\ComfyUI_windows_portable\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py

the difference seems to be that my has a /ComfyUI/ folder inside ComfyUI_windows_portable while the program tries to find one that doesn´t , any way to fix that without fucking my comfyui?

For me the problem was the program was looking for the custom_nodes folder in the wrong location.
Same as what @xplpex pointed out, and thanks for helping me figuring it out, by the way!

By the way, to me it only happened when running the "Lora Training in Comfy (Advanced)" node in ComfyUI. The non-advanced version was working fine.
Since the only problem was the location of custom_nodes folder, and I did not want to copy the entire directory and manage it in two different places, I just made a dynamic link and it solved my problem!

I use Windows 11 so here are the steps (should work on Windows 10 as well):

  • Open command prompt (CMD) as administrator
  • Go to your ComfyUI_windows_portable directory
  • Run "mklink /d custom_nodes ComfyUI\custom_nodes"

This should create a symbolic-link directory named custom_nodes inside ComfyUI_windows_portable leading to the one inside ComfyUI. This shouldn't interfere with anything, hope other people find this information helpful.

Cheers!

loading model for process 0/1
load StableDiffusion checkpoint: E:\Ai\ComfyUI_windows_portable\ComfyUI\models\checkpoints\DreamShaper_8_pruned.safetensors
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net:
loading vae:
loading text encoder:
Enable xformers for U-Net
Traceback (most recent call last):
File "E:\Ai\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 1012, in
trainer.train(args)
File "E:\Ai\ComfyUI_windows_portable\ComfyUI\custom_nodes\Lora-Training-in-Comfy\sd-scripts\train_network.py", line 236, in train
vae.set_use_memory_efficient_attention_xformers(args.xformers)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 259, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\modeling_utils.py", line 252, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\diffusers\models\attention_processor.py", line 261, in set_use_memory_efficient_attention_xformers
raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 996, in
main()
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 992, in main
launch_command(args)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "C:\Users\NK\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\NK\AppData\Local\Programs\Python\Python310\python.exe', 'E:/Ai/ComfyUI_windows_portable/ComfyUI/custom_nodes/Lora-Training-in-Comfy/sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=E:\Ai\ComfyUI_windows_portable\ComfyUI\models\checkpoints\DreamShaper_8_pruned.safetensors', '--train_data_dir=E:/database', '--output_dir=E:\files', '--logging_dir=./logs', '--log_prefix=lindalaura', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=50', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=lindalaura', '--train_batch_size=1', '--save_every_n_epochs=10', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=22', '--cache_latents', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1584', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard', '--clip_skip=2', '--optimizer_type=AdamW8bit', '--persistent_data_loader_workers', '--log_with=tensorboard']' returned non-zero exit status 1.
Train finished
Prompt executed in 14.07 seconds

okay so i guess its a memory restriction or i am completly wrong ?? i have a 3070 and 16Gb ram not sure would it work in 16Gb ram....

any Help is appricated thanks

Yup have similar issue tried to launch the lora training ad have this
изображение_2024-07-21_230720305

I install accelerate module through command line but then it start gives me this error
изображение_2024-07-21_231406052

Fixed most this errors with "-pip install" command like "-pip install accelerate" .etc just adding name of module that this error show up in this case it was "toml" module.

Finally it passthrough to loading image size step, but then i start getting this
изображение_2024-07-21_234026661
изображение_2024-07-21_234052293