aigc-apps/sd-webui-EasyPhoto

[Bug]: Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.

longglecc opened this issue · 5 comments

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Is EasyPhoto the latest version?

  • I have updated EasyPhoto to the latest version and the bug still exists.

What happened?

2024-03-15 08:14:15.067244000 [E:onnxruntime:Default, provider_bridge_ort.cc:1534 TryGetProviderInfo_TensorRT] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_tensorrt.so with error: libnvinfer.so.8: cannot open shared object file: No such file or directory

*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:456 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(onnxruntime::python::PySessionOptions&, const ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.


2024-03-15 08:14:15.180632383 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer 'Sub__1664:0'. It is not used by any node and should be removed from the model.
2024-03-15 08:14:15.180648438 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer 'Shape__1662:0'. It is not used by any node and should be removed from the model.
2024-03-15 08:14:15.180657188 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer 'const_4__1660'. It is not used by any node and should be removed from the model.
2024-03-15 08:14:15.241504061 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/coarse/Conv2d_transpose/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.241513052 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/coarse/Conv2d_transpose/BiasAdd
2024-03-15 08:14:15.241747033 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/coarse/Conv2d_transpose_1/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.241750607 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/coarse/Conv2d_transpose_1/BiasAdd
2024-03-15 08:14:15.241983280 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/coarse/Conv2d_transpose_2/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.241986457 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/coarse/Conv2d_transpose_2/BiasAdd
2024-03-15 08:14:15.242220288 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/coarse/Conv2d_transpose_3/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242223572 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/coarse/Conv2d_transpose_3/BiasAdd
2024-03-15 08:14:15.242406062 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/coarse/Conv2d_transpose_4/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242409303 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/coarse/Conv2d_transpose_4/BiasAdd
2024-03-15 08:14:15.242623624 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/refine/up_conv1/conv2d_transpose to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242627156 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/refine/up_conv1/conv2d_transpose
2024-03-15 08:14:15.242634661 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/refine/up_conv2/conv2d_transpose to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242637447 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/refine/up_conv2/conv2d_transpose
2024-03-15 08:14:15.242644868 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/refine/up_conv3/conv2d_transpose to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242647795 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/refine/up_conv3/conv2d_transpose
2024-03-15 08:14:15.242655069 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/refine/up_conv4/conv2d_transpose to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242657894 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/refine/up_conv4/conv2d_transpose
2024-03-15 08:14:15.242666008 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: cond_1/refine/up_conv5/conv2d_transpose to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.242669226 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: cond_1/refine/up_conv5/conv2d_transpose
2024-03-15 08:14:15.245905512 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: detect/Conv2d_transpose/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.245909776 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: detect/Conv2d_transpose/BiasAdd
2024-03-15 08:14:15.246154186 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: detect/Conv2d_transpose_1/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.246157716 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: detect/Conv2d_transpose_1/BiasAdd
2024-03-15 08:14:15.246398343 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: detect/Conv2d_transpose_2/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.246401448 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: detect/Conv2d_transpose_2/BiasAdd
2024-03-15 08:14:15.246635291 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: detect/Conv2d_transpose_3/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.246638579 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: detect/Conv2d_transpose_3/BiasAdd
2024-03-15 08:14:15.246817512 [W:onnxruntime:, cuda_execution_provider.cc:2319 ConvTransposeNeedFallbackToCPU] Dropping the ConvTranspose node: detect/Conv2d_transpose_4/BiasAdd to CPU because it requires asymmetric padding which the CUDA EP currently does not support
2024-03-15 08:14:15.246820719 [W:onnxruntime:, cuda_execution_provider.cc:2426 GetCapability] CUDA kernel not supported. Fallback to CPU execution provider for Op type: ConvTranspose node name: detect/Conv2d_transpose_4/BiasAdd
2024-03-15 08:14:15.282360569 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 21 Memcpy nodes are added to the graph tf2onnx for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-15 08:14:15.288723331 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 29 Memcpy nodes are added to the graph tf2onnx__223 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-15 08:14:15.289002931 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph tf2onnx__547 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-15 08:14:15.289262372 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 5 Memcpy nodes are added to the graph tf2onnx__485 for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
2024-03-15 08:14:16,137 - modelscope - INFO - Use user-specified model revision: v1.0.0
2024-03-15 08:14:16,380 - modelscope - INFO - initiate model from /home/gaol/.cache/modelscope/hub/damo/cv_gpen_image-portrait-enhancement
2024-03-15 08:14:16,380 - modelscope - INFO - initiate model from location /home/gaol/.cache/modelscope/hub/damo/cv_gpen_image-portrait-enhancement.
2024-03-15 08:14:16,381 - modelscope - INFO - initialize model from /home/gaol/.cache/modelscope/hub/damo/cv_gpen_image-portrait-enhancement
Loading ResNet ArcFace
2024-03-15 08:14:18,036 - modelscope - INFO - load face enhancer model done
2024-03-15 08:14:18,283 - modelscope - INFO - load face detector model done
2024-03-15 08:14:18,526 - modelscope - INFO - load sr model done
2024-03-15 08:14:19,245 - modelscope - INFO - load fqa model done
0%| | 0/15 [00:00<?, ?it/s]2024-03-15 08:14:19,686 - modelscope - WARNING - task skin-retouching-torch input definition is missing
2024-03-15 08:14:23,763 - modelscope - WARNING - task skin-retouching-torch output keys are missing
2024-03-15 08:14:23,767 - modelscope - WARNING - task face_recognition input definition is missing
2024-03-15 08:14:23,911 - modelscope - INFO - model inference done
2024-03-15 08:14:23,911 - modelscope - WARNING - task face_recognition output keys are missing
7%|██████████▎ | 1/15 [00:04<01:04, 4.64s/it]2024-03-15 08:14:24,554 - modelscope - INFO - model inference done
13%|████████████████████▋ | 2/15 [00:05<00:29, 2.29s/it]2024-03-15 08:14:25,148 - modelscope - INFO - model inference done
20%|███████████████████████████████ | 3/15 [00:05<00:18, 1.51s/it]2024-03-15 08:14:25,792 - modelscope - INFO - model inference done
27%|█████████████████████████████████████████▎ | 4/15 [00:06<00:12, 1.17s/it]2024-03-15 08:14:26,642 - modelscope - INFO - model inference done
33%|███████████████████████████████████████████████████▋ | 5/15 [00:07<00:10, 1.06s/it]2024-03-15 08:14:27,205 - modelscope - INFO - model inference done
40%|██████████████████████████████████████████████████████████████ | 6/15 [00:07<00:07, 1.13it/s]2024-03-15 08:14:27,831 - modelscope - INFO - model inference done
47%|████████████████████████████████████████████████████████████████████████▎ | 7/15 [00:08<00:06, 1.25it/s]2024-03-15 08:14:28,921 - modelscope - INFO - model inference done
53%|██████████████████████████████████████████████████████████████████████████████████▋ | 8/15 [00:09<00:06, 1.12it/s]2024-03-15 08:14:29,875 - modelscope - INFO - model inference done
60%|█████████████████████████████████████████████████████████████████████████████████████████████ | 9/15 [00:10<00:05, 1.10it/s]2024-03-15 08:14:30,772 - modelscope - INFO - model inference done
67%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 10/15 [00:11<00:04, 1.10it/s]2024-03-15 08:14:31,635 - modelscope - INFO - model inference done
73%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 11/15 [00:12<00:03, 1.12it/s]2024-03-15 08:14:32,752 - modelscope - INFO - model inference done
80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 12/15 [00:13<00:02, 1.04it/s]2024-03-15 08:14:33,692 - modelscope - INFO - model inference done
87%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 13/15 [00:14<00:01, 1.05it/s]2024-03-15 08:14:34,291 - modelscope - INFO - model inference done
93%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 14/15 [00:15<00:00, 1.18it/s]2024-03-15 08:14:34,920 - modelscope - INFO - model inference done
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:15<00:00, 1.04s/it]
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/0.jpg total scores: 0.5048029519210812 face angles 0.9206686066500163
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/12.jpg total scores: 0.49235572689416074 face angles 0.9208677161732743
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/6.jpg total scores: 0.4757991017879283 face angles 0.9655199974287817
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/2.jpg total scores: 0.47179465965092976 face angles 0.9158875840641704
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/13.jpg total scores: 0.46459175877098685 face angles 0.9542150597199701
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/4.jpg total scores: 0.4373414103114538 face angles 0.8650355070487793
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/7.jpg total scores: 0.4354206394466526 face angles 0.9829593007625231
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/11.jpg total scores: 0.4197791398965481 face angles 0.7875377991892176
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/8.jpg total scores: 0.41548011881379737 face angles 0.9469831125519275
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/5.jpg total scores: 0.39574758521373066 face angles 0.9770553817546423
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/14.jpg total scores: 0.38421667143713195 face angles 0.9993536825291083
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/10.jpg total scores: 0.3491402003211737 face angles 0.739845540721069
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/9.jpg total scores: 0.3327516258991929 face angles 0.8863734503059482
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/1.jpg total scores: 0.31481701863111694 face angles 0.6169483135101612
selected paths: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/original_backup/3.jpg total scores: 0.3134098934345057 face angles 0.8504926798977891
jpg: 0.jpg face_id_scores 0.5048029519210812
jpg: 12.jpg face_id_scores 0.49235572689416074
jpg: 11.jpg face_id_scores 0.4197791398965481
jpg: 2.jpg face_id_scores 0.47179465965092976
jpg: 1.jpg face_id_scores 0.31481701863111694
jpg: 4.jpg face_id_scores 0.4373414103114538
jpg: 6.jpg face_id_scores 0.4757991017879283
jpg: 13.jpg face_id_scores 0.46459175877098685
jpg: 10.jpg face_id_scores 0.3491402003211737
jpg: 7.jpg face_id_scores 0.4354206394466526
jpg: 8.jpg face_id_scores 0.41548011881379737
jpg: 5.jpg face_id_scores 0.39574758521373066
jpg: 14.jpg face_id_scores 0.38421667143713195
jpg: 9.jpg face_id_scores 0.3327516258991929
jpg: 3.jpg face_id_scores 0.3134098934345057
15it [00:12, 1.16it/s]
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/0.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/1.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/2.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/3.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/4.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/5.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/6.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/7.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/8.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/9.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/10.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/11.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/12.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/13.jpg
save processed image to /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images/train/14.jpg
2024-03-15 08:14:48,471 - EasyPhoto - train_file_path : /home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py
2024-03-15 08:14:48,472 - EasyPhoto - cache_log_file_path: /home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt
2024-03-15 08:14:53,567 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2024-03-15 08:14:53,569 - modelscope - INFO - TensorFlow version 2.16.1 Found.
2024-03-15 08:14:53,569 - modelscope - INFO - Loading ast index from /home/gaol/.cache/modelscope/ast_indexer
2024-03-15 08:14:53,585 - modelscope - INFO - Loading done! Current index file version is 1.9.3, with md5 985d60ab3829178ada728d5649a2ffda and a total number of 943 components indexed
03/15/2024 08:14:54 - INFO - main - Distributed environment: MULTI_GPU Backend: nccl
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

{'variance_type', 'prediction_type', 'thresholding', 'clip_sample_range', 'timestep_spacing', 'sample_max_value', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
UNet2DConditionModel: 64, 8, 768, False, False
loading u-net:
loading vae:
loading text encoder:
create LoRA network. base dim (rank): 128, alpha: 64
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder:
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py:792: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn("xformers is not available. Make sure it is installed correctly")
03/15/2024 08:15:00 - WARNING - main - xformers is not available. Make sure it is installed correctly
Resolving data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 449908.04it/s]
Downloading and preparing dataset imagefolder/default to /home/gaol/.cache/huggingface/datasets/imagefolder/default-f3c5867687810c1c/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f...
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 101372.91it/s]
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 124337.08it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 6935.79it/s]
Dataset imagefolder downloaded and prepared to /home/gaol/.cache/huggingface/datasets/imagefolder/default-f3c5867687810c1c/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f. Subsequent calls will reuse this data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1409.85it/s]
/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
03/15/2024 08:15:02 - INFO - main - ***** Running training *****
03/15/2024 08:15:02 - INFO - main - Num examples = 15
03/15/2024 08:15:02 - INFO - main - Num Epochs = 3000
03/15/2024 08:15:02 - INFO - main - Instantaneous batch size per device = 8
03/15/2024 08:15:02 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 32
03/15/2024 08:15:02 - INFO - main - Gradient Accumulation steps = 4
03/15/2024 08:15:02 - INFO - main - Total optimization steps = 3000
Steps: 0%| | 0/3000 [00:00<?, ?it/s]2024-03-15 08:15:03,352 - modelscope - INFO - Model revision not specified, use revision: v2.0.2
2024-03-15 08:15:05,546 - modelscope - INFO - initiate model from /home/gaol/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface
2024-03-15 08:15:05,546 - modelscope - INFO - initiate model from location /home/gaol/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface.
2024-03-15 08:15:05,547 - modelscope - WARNING - No preprocessor field found in cfg.
2024-03-15 08:15:05,547 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.
2024-03-15 08:15:05,547 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': '/home/gaol/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface'}. trying to build by task and model information.
2024-03-15 08:15:05,547 - modelscope - WARNING - Find task: face-detection, model type: None. Insufficient information to build preprocessor, skip building preprocessor
2024-03-15 08:15:05,548 - modelscope - INFO - loading model from /home/gaol/.cache/modelscope/hub/damo/cv_resnet50_face-detection_retinaface/pytorch_model.pt
/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None.
warnings.warn(msg)
2024-03-15 08:15:05,796 - modelscope - INFO - load model done
/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py:459: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
return F.conv2d(input, weight, bias, self.stride,
/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 12, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
[2024-03-15 08:15:10,905] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2024-03-15 08:15:11,105] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing call
[2024-03-15 08:15:11,107] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward
[2024-03-15 08:15:12,439] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo done tracing forward (RETURN_VALUE)
[2024-03-15 08:15:12,456] torch._dynamo.output_graph: [INFO] Step 2: calling compiler function debug_wrapper
Traceback (most recent call last):
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/dynamo/output_graph.py", line 670, in call_user_compiler
compiled_fn = compiler_fn(gm, self.fake_example_inputs())
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/dynamo/debug_utils.py", line 1055, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/init.py", line 1390, in call
return compile_fx(model
, inputs
, config_patches=self.config)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 455, in compile_fx
return aot_autograd(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 48, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2822, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2515, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1676, in aot_wrapper_dedupe
fw_metadata, _out = run_functionalized_fw_and_collect_metadata(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 607, in inner
flat_f_outs = f(*flat_f_args)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2793, in functional_call
out = Interpreter(mod).run(*args[params_len:], **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run
self.env[node] = self.run_node(node)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node
return getattr(self, n.op)(n.target, args, kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/fx/interpreter.py", line 294, in call_module
return submod(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/utils/lora_utils.py", line 140, in forward
lx = self.lora_down(x)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 38, in torch_function
return func(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 987, in torch_dispatch
return self.dispatch(func, types, args, kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1066, in dispatch
args, kwargs = self.validate_and_convert_non_fake_tensors(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1220, in validate_and_convert_non_fake_tensors
return tree_map_only(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_pytree.py", line 266, in tree_map_only
return tree_map(map_only(ty)(fn), pytree)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in tree_map
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_pytree.py", line 196, in
return tree_unflatten([fn(i) for i in flat_args], spec)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_pytree.py", line 247, in inner
return f(x)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1212, in validate
raise Exception(
Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten._to_copy.default(
(Parameter containing:
tensor([[-0.0138, -0.0359, -0.0290, ..., 0.0133, 0.0205, -0.0051],
[-0.0148, -0.0244, 0.0315, ..., 0.0310, -0.0247, 0.0210],
[-0.0090, -0.0062, -0.0061, ..., 0.0337, 0.0276, 0.0345],
...,
[-0.0216, -0.0222, 0.0058, ..., 0.0171, 0.0139, 0.0286],
[-0.0131, -0.0117, 0.0049, ..., 0.0252, 0.0084, -0.0211],
[-0.0176, -0.0148, 0.0318, ..., 0.0353, -0.0111, -0.0319]],
device='cuda:0', requires_grad=True),), **{'dtype': torch.float16})

While executing %self_text_model_encoder_layers_0_self_attn_q_proj : [#users=1] = call_module[target=self_text_model_encoder_layers_0_self_attn_q_proj](args = (%self_text_model_encoder_layers_0_layer_norm1,), kwargs = {})
Original traceback:
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 272, in forward
query_states = self.q_proj(hidden_states) * self.scale
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 383, in forward
hidden_states, attn_weights = self.self_attn(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 654, in forward
layer_outputs = encoder_layer(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 740, in forward
encoder_outputs = self.encoder(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 822, in forward
return self.text_model(

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1397, in
main()
File "/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/utils/gpu_info.py", line 190, in wrapper
result = func(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py", line 1132, in main
encoder_hidden_states = text_encoder(batch["input_ids"])[0]
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn
return fn(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward
return model_forward(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 569, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors
return callback(frame, cache_size, hooks)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame
result = inner_convert(frame, cache_size, hooks)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn
return fn(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert
return _compile(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(*args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile
out_code = transform_code_object(code, transform)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object
transformations(instructions, code_options)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform
tracer.run()
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run
super().run()
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run
and self.step()
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step
getattr(self, inst.opname)(inst)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1792, in RETURN_VALUE
self.output.compile_subgraph(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper
r = func(args, **kwargs)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: debug_wrapper raised Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten._to_copy.default(
(Parameter containing:
tensor([[-0.0138, -0.0359, -0.0290, ..., 0.0133, 0.0205, -0.0051],
[-0.0148, -0.0244, 0.0315, ..., 0.0310, -0.0247, 0.0210],
[-0.0090, -0.0062, -0.0061, ..., 0.0337, 0.0276, 0.0345],
...,
[-0.0216, -0.0222, 0.0058, ..., 0.0171, 0.0139, 0.0286],
[-0.0131, -0.0117, 0.0049, ..., 0.0252, 0.0084, -0.0211],
[-0.0176, -0.0148, 0.0318, ..., 0.0353, -0.0111, -0.0319]],
device='cuda:0', requires_grad=True),), **{'dtype': torch.float16})

While executing %self_text_model_encoder_layers_0_self_attn_q_proj : [#users=1] = call_module[target=self_text_model_encoder_layers_0_self_attn_q_proj](args = (%self_text_model_encoder_layers_0_layer_norm1,), kwargs = {})
Original traceback:
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 272, in forward
query_states = self.q_proj(hidden_states) * self.scale
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 383, in forward
hidden_states, attn_weights = self.self_attn(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 654, in forward
layer_outputs = encoder_layer(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 740, in forward
encoder_outputs = self.encoder(
| File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 822, in forward
return self.text_model(

Set torch._dynamo.config.verbose=True for more information

You can suppress this exception and fall back to eager by setting:
torch._dynamo.config.suppress_errors = True

Steps: 0%| | 0/3000 [00:11<?, ?it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5161) of binary: /home/gaol/codes/temp/stable-diffusion-webui/venv/bin/python3
Traceback (most recent call last):
File "/home/gaol/miniforge3/envs/stable/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gaol/miniforge3/envs/stable/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 989, in
main()
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 985, in main
launch_command(args)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 970, in launch_command
multi_gpu_launcher(args)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/gaol/codes/temp/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-03-15_08:15:16
host : rtx3060
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5161)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

2024-03-15 08:15:16,778 - EasyPhoto - Error executing the command: Command '['/home/gaol/codes/temp/stable-diffusion-webui/venv/bin/python3', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', '/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/home/gaol/codes/temp/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=8', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=3000', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=483273', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/user_weights', '--logging_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=lifan', '--cache_log_file=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
Applying attention optimization: Doggettx... done.

Steps to reproduce the problem

  1. Go to ....
  2. Press ....
  3. ...

What should have happened?

rtx3060

Commit where the problem happens

WebUI 版本:v1.0.0-pre-591-g22bcc7be• 
Python 版本:3.10.6  •  torch: 2.0.1+cu118  •  xformers: N/A  •  gradio: 3.41.2  • 
EastPhoto: lasted

What browsers do you use to access the UI ?

No response

Command Line Arguments

sh webui.sh

List of enabled extensions

image

Console logs

Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-15_08:15:16
  host      : rtx3060
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 5161)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
2024-03-15 08:15:16,778 - EasyPhoto - Error executing the command: Command '['/home/gaol/codes/temp/stable-diffusion-webui/venv/bin/python3', '-m', 'accelerate.commands.launch', '--mixed_precision=fp16', '--main_process_port=3456', '/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/scripts/train_kohya/train_lora.py', '--pretrained_model_name_or_path=/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/stable-diffusion-v1-5', '--pretrained_model_ckpt=/home/gaol/codes/temp/stable-diffusion-webui/models/Stable-diffusion/Chilloutmix-Ni-pruned-fp16-fix.safetensors', '--train_data_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/processed_images', '--caption_column=text', '--resolution=512', '--random_flip', '--train_batch_size=8', '--gradient_accumulation_steps=4', '--dataloader_num_workers=16', '--max_train_steps=3000', '--checkpointing_steps=100', '--learning_rate=0.0001', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--train_text_encoder', '--seed=483273', '--rank=128', '--network_alpha=64', '--validation_prompt=easyphoto_face, easyphoto, 1person', '--validation_steps=100', '--output_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/user_weights', '--logging_dir=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-user-id-infos/lifan/user_weights', '--enable_xformers_memory_efficient_attention', '--mixed_precision=fp16', '--template_dir=/home/gaol/codes/temp/stable-diffusion-webui/extensions/sd-webui-EasyPhoto/models/training_templates', '--template_mask', '--merge_best_lora_based_face_id', '--merge_best_lora_name=lifan', '--cache_log_file=/home/gaol/codes/temp/stable-diffusion-webui/outputs/easyphoto-tmp/train_kohya_log.txt', '--validation']' returned non-zero exit status 1.
Applying attention optimization: Doggettx... done.

Additional information

No response

怀疑是tensorRT没有安装的原因,我手动安装了tensorRT还是相同的问题
image

torch版本和tf版本
image
image

但是如果是tensorRT的问题,那不会继续执行了啊,我看命令行后续还在继续输出, 并没有看出是哪里出现了问题,最后就失败了

我又把xformers 也安装上了,报的错误还是一样的

找到原因了
image