Error output when WD14 Captioning
ThanapatSornsrivichai opened this issue ยท 14 comments
Got this error when try to caption with WD14. Image size >1000x1000
GPU rtx3090
tried accelerate config and update again, not working
Captioning files in D:/Kohya/dataset/yorra ench...
accelerate launch "./finetune/tag_images_by_wd14_tagger.py" --batch_size="1" --thresh="0.35" --caption_extension=".txt" "D:/Kohya/dataset/yorra ench"
2023-02-18 01:25:45.516228: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-02-18 01:25:45.516363: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
using existing wd14 tagger model
found 19 images.
loading model and labels
2023-02-18 01:25:50.592676: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "D:\Kohya\kohya_ss\finetune\tag_images_by_wd14_tagger.py", line 200, in
main(args)
File "D:\Kohya\kohya_ss\finetune\tag_images_by_wd14_tagger.py", line 96, in main
model = load_model(args.model_dir)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "D:\Kohya\kohya_ss\venv\lib\site-packages\tensorflow\python\eager\context.py", line 622, in ensure_initialized
context_handle = pywrap_tfe.TFE_NewContext(opts)
tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: cudaGetErrorString symbol not found.
Traceback (most recent call last):
File "C:\Users\thana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\thana\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Kohya\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\Kohya\kohya_ss\venv\Scripts\python.exe', './finetune/tag_images_by_wd14_tagger.py', '--batch_size=1', '--thresh=0.35', '--caption_extension=.txt', 'D:/Kohya/dataset/yorra ench']' returned non-zero exit status 1.
...captioning done
This is really strange. I have never seen this error and I have no idea that it might be... appear to be related to some Status: cudaGetErrorString symbol not found.
when running the model... I will keep an eye on it.
I'm having exactly the same issue, with a RTX 3060.
The optional CUDNN 8.6 has been installed, maybe I shouldn't have ? I don't have a single idea how to rebuild Tensorflow, let alone "with the appropriate complier flags", this is way over my skills
Edit: the trick for now is to use the WD14 Tagger extension for WebUI, there is a batch option :)
https://github.com/toriato/stable-diffusion-webui-wd14-tagger.git
same problem here
OK after more than an hour of testing different methods, I resolved the issue.
First ensure you have Microsoft Visual C++ Redistributable for Visual Studio 2015-2022 (I already had this, but this may be one of the reasons it fails).
Edit: Please note I am unsure if the following is a good solution because TF version > 2.10 cannot use GPU on Windows Native
For my case installing latest version of TF resolved the error:
Run in powershell / cmd at project root:
.\venv\Scripts\activate
pip install tf-nightly
This means its a problem with the TensorFlow version?
Hopefully this helps identify the problem @bmaltais
OK Heres a different solution. I installed CUDA v11.2 (Only this exact version works for the tensorflow 2.10 required in this project) and CudNN (I used v8.1.1 but probably 8.5+ should be compatible too but I haven't tested). This got rid of the errors and let the script continue as intended.
An alternative for the time being: https://github.com/toriato/stable-diffusion-webui-wd14-tagger
I'm using windows with 3090
I had the same error with wd tagger but it worked out with @williamkmlau 's magic.
.\venv\Scripts\activate
pip install tf-nightly
**treksis ** commented Mar 5, 2023
Have a dependency issue. Set up cleanly and installed tf-nightly.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xformers 0.0.14.dev0 requires pyre-extensions==0.0.23, which is not installed.
tensorflow 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.22.0 which is incompatible.
tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.22.0 which is incompatible.
tensorboard 2.10.1 requires tensorboard-data-server<0.7.0,>=0.6.0, but you have tensorboard-data-server 0.7.0 which is incompatible.
I think using wd14-tagger from the public extension of automatic1111 is the good alternative for now.
Don't know if it helps anyone, but I had the same error when installing:
2023-02-18 01:25:45.516228: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2023-02-18 01:25:45.516363: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The issue was that somehow a newer version of torch/torchvision was installed, which does not seem to include cudart64_110.dll.
Running pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
again fixed the error for me.
This is a monkey patch. Please consider a permanent solution.
\venv\Scripts\activate.bat
rem set PATH=%VIRTUAL_ENV%\Scripts;%PATH%
set PATH=%VIRTUAL_ENV%\Scripts;%VIRTUAL_ENV%\Lib\site-packages\torch\lib;%PATH%
set VIRTUAL_ENV_PROMPT=(venv)
In short, the problem is that the PATH set in venv does not include the path to the cudart64_110.dll installed in site-packages. I tried to solve this problem with os.add_dll_directory(), but I couldn't add the PATH in the venv environment.
I think those who don't encounter this problem have the cudart64_110.dll located in a PATH that is already set.
I got some hints from this issue:
tensorflow/tensorflow#43193
If you want to debug this problem, it's a good idea to check the location of the cudart64_110.dll and exec print(os.environ['PATH'])
.
This idea is better because it does not require modifying venv.
gui.bat
@echo off
:: Activate the virtual environment
call .\venv\Scripts\activate.bat
set PATH=%PATH%;%~dp0venv\Lib\site-packages\torch\lib
:: Validate the requirements and store the exit code
python.exe .\tools\validate_requirements.py
:: If the exit code is 0, run the kohya_gui.py script with the command-line arguments
if %errorlevel% equ 0 (
python.exe kohya_gui.py %*
)
Thanks, I will add it to thebat file and also add the equivalent for the ps1 file.
cudart64_110.dll not found, 21.5.5
I'm new to all of this so I don't have a solid understanding of how to get it in there, but I have tried
.\venv\Scripts\activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
python.exe -m pip install --upgrade pip