LaurentMazare/diffusers-rs

CUDNN_STATUS_NOT_INITIALIZED

sak96 opened this issue · 6 comments

sak96 commented
Building the autoencoder.
Building the unet.
Timestep 0/30
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Torch("cuDNN error: CUDNN_STATUS_NOT_INITIALIZED\nException raised from createCuDNNHandle at /build/python-pytorch/src/pytorch-1.13.0-cuda/aten/src/ATen/cudnn/Handle.cpp:9 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x92 (0x7fdb34705bf2 in /usr/lib/libc10.so)\nframe #1: <unknown function> + 0xd89413 (0x7fdaeab89413 in /usr/lib/libtorch_cuda.so)\nframe #2: at::native::getCudnnHandle() + 0x7b8 (0x7fdaeaeded18 in /usr/lib/libtorch_cuda.so)\nframe #3: <unknown function> + 0x1055f89 (0x7fdaeae55f89 in /usr/lib/libtorch_cuda.so)\nframe #4: <unknown function> + 0x10505f4 (0x7fdaeae505f4 in /usr/lib/libtorch_cuda.so)\nframe #5: at::native::cudnn_convolution(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0xad (0x7fdaeae50a2d in /usr/lib/libtorch_cuda.so)\nframe #6: <unknown function> + 0x3882bf4 (0x7fdaed682bf4 in /usr/lib/libtorch_cuda.so)\nframe #7: <unknown function> + 0x3882cad (0x7fdaed682cad in /usr/lib/libtorch_cuda.so)\nframe #8: at::_ops::cudnn_convolution::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0x226 (0x7fdae0416e46 in /usr/lib/libtorch_cpu.so)\nframe #9: at::native::_convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x1097 (0x7fdadf821ff7 in /usr/lib/libtorch_cpu.so)\nframe #10: <unknown function> + 0x258518e (0x7fdae078518e in /usr/lib/libtorch_cpu.so)\nframe #11: at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x299 (0x7fdadff96759 in /usr/lib/libtorch_cpu.so)\nframe #12: at::native::convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x111 (0x7fdadf814bd1 in /usr/lib/libtorch_cpu.so)\nframe #13: <unknown function> + 0x2584c3e (0x7fdae0784c3e in /usr/lib/libtorch_cpu.so)\nframe #14: at::_ops::convolution::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x15d (0x7fdadff43b3d in /usr/lib/libtorch_cpu.so)\nframe #15: <unknown function> + 0x43b0226 (0x7fdae25b0226 in /usr/lib/libtorch_cpu.so)\nframe #16: <unknown function> + 0x43b1060 (0x7fdae25b1060 in /usr/lib/libtorch_cpu.so)\nframe #17: at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x247 (0x7fdadff95a27 in /usr/lib/libtorch_cpu.so)\nframe #18: at::native::conv2d(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x20d (0x7fdadf81908d in /usr/lib/libtorch_cpu.so)\nframe #19: <unknown function> + 0x273d3f6 (0x7fdae093d3f6 in /usr/lib/libtorch_cpu.so)\nframe #20: at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x202 (0x7fdae053fad2 in /usr/lib/libtorch_cpu.so)\nframe #21: <unknown function> + 0x2f359e (0x55ad31d2559e in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #22: <unknown function> + 0x2fe5da (0x55ad31d305da in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #23: <unknown function> + 0x2b79bd (0x55ad31ce99bd in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #24: <unknown function> + 0x2bd561 (0x55ad31cef561 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #25: <unknown function> + 0x2e1ad0 (0x55ad31d13ad0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #26: <unknown function> + 0x2c05f1 (0x55ad31cf25f1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #27: <unknown function> + 0xd90f5 (0x55ad31b0b0f5 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #28: <unknown function> + 0x96438 (0x55ad31ac8438 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #29: <unknown function> + 0x975a1 (0x55ad31ac95a1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #30: <unknown function> + 0xb496b (0x55ad31ae696b in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #31: <unknown function> + 0xa10ae (0x55ad31ad30ae in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #32: <unknown function> + 0xacbf1 (0x55ad31adebf1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #33: <unknown function> + 0x621aee (0x55ad32053aee in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #34: <unknown function> + 0xacbc0 (0x55ad31adebc0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #35: <unknown function> + 0x9b83c (0x55ad31acd83c in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #36: <unknown function> + 0x23290 (0x7fdade03c290 in /usr/lib/libc.so.6)\nframe #37: __libc_start_main + 0x8a (0x7fdade03c34a in /usr/lib/libc.so.6)\nframe #38: <unknown function> + 0x91905 (0x55ad31ac3905 in $HOME.cargo-target/debug/examples/stable-diffusion)\n")', $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:72
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
   2: core::result::unwrap_failed
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
   4: tch::wrappers::tensor_generated::<impl tch::wrappers::tensor::Tensor>::conv2d
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:9
   5: <tch::nn::conv::Conv<[i64; 2]> as tch::nn::module::Module>::forward
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/conv.rs:216:9
   6: tch::nn::module::<impl tch::wrappers::tensor::Tensor>::apply
             at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/module.rs:47:9
   7: diffusers::models::unet_2d::UNet2DConditionModel::forward
             at ./src/models/unet_2d.rs:237:18
   8: stable_diffusion::run
             at ./examples/stable-diffusion/main.rs:167:30
   9: stable_diffusion::main
             at ./examples/stable-diffusion/main.rs:200:9
  10: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Not sure if this is regarding the library or some other issue.

Dumb question. Do you have a CUDA enabled GPU on your system?

sak96 commented

oh yeah i forgot to give details about the machine.

% lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices ....

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   32C    P3    N/A /  N/A |      5MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       666      G   /usr/lib/Xorg                       4MiB |
+-----------------------------------------------------------------------------+

6Gb 3060

% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

using export TORCH_CUDA_VERSION=cu118.
would any other details be required.

i am using f16 model

% sha256sum unet16.bin
5019a4fbb455dd9b75192afc3ecf8a8ec875e83812fd51029d2e19277edddebc  unet16.bin

Could try something like

println!("Cuda available: {}", tch::Cuda::is_available());
println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());

To see if the tch library can see it.

sak96 commented
Cuda available: true
Cudnn available: true
Cuda available: true
Cudnn available: true

i am not sure why it printed stuff twice though.

--- a/examples/stable-diffusion/main.rs
+++ b/examples/stable-diffusion/main.rs
@@ -196,6 +196,8 @@ fn run(args: Args) -> anyhow::Result<()> {

 fn main() -> anyhow::Result<()> {
     let args = Args::parse();
+    println!("Cuda available: {}", tch::Cuda::is_available());
+    println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());
     if !args.autocast {
         run(args)
     } else {

EDIT:
i found that the cuda code is also part of the code.

sak96 commented

some solution i found was: pytorch/pytorch#16831 (comment)

tch::Cuda::cudnn_set_benchmark(false);

this did not help.

there is another issue with same stuff: tensorflow/tensorflow#6698 (comment) or https://stackoverflow.com/a/52634209
but i dont see any api which can be used to do the same in tch-rs. if you have any idea please let me know.

Do you have libtorch installed? I had this issue, then fixed it, then forgot exactly what fixed it 😑
The things I tried were: installing the cuda version of pytorch, installing cuda 11.8, installing cudnn 8.9.1, and installing libtorch. After libtorch, it just worked magically.