CUDNN_STATUS_NOT_INITIALIZED
sak96 opened this issue · 6 comments
Building the autoencoder.
Building the unet.
Timestep 0/30
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Torch("cuDNN error: CUDNN_STATUS_NOT_INITIALIZED\nException raised from createCuDNNHandle at /build/python-pytorch/src/pytorch-1.13.0-cuda/aten/src/ATen/cudnn/Handle.cpp:9 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x92 (0x7fdb34705bf2 in /usr/lib/libc10.so)\nframe #1: <unknown function> + 0xd89413 (0x7fdaeab89413 in /usr/lib/libtorch_cuda.so)\nframe #2: at::native::getCudnnHandle() + 0x7b8 (0x7fdaeaeded18 in /usr/lib/libtorch_cuda.so)\nframe #3: <unknown function> + 0x1055f89 (0x7fdaeae55f89 in /usr/lib/libtorch_cuda.so)\nframe #4: <unknown function> + 0x10505f4 (0x7fdaeae505f4 in /usr/lib/libtorch_cuda.so)\nframe #5: at::native::cudnn_convolution(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0xad (0x7fdaeae50a2d in /usr/lib/libtorch_cuda.so)\nframe #6: <unknown function> + 0x3882bf4 (0x7fdaed682bf4 in /usr/lib/libtorch_cuda.so)\nframe #7: <unknown function> + 0x3882cad (0x7fdaed682cad in /usr/lib/libtorch_cuda.so)\nframe #8: at::_ops::cudnn_convolution::call(at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, bool) + 0x226 (0x7fdae0416e46 in /usr/lib/libtorch_cpu.so)\nframe #9: at::native::_convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x1097 (0x7fdadf821ff7 in /usr/lib/libtorch_cpu.so)\nframe #10: <unknown function> + 0x258518e (0x7fdae078518e in /usr/lib/libtorch_cpu.so)\nframe #11: at::_ops::_convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long, bool, bool, bool, bool) + 0x299 (0x7fdadff96759 in /usr/lib/libtorch_cpu.so)\nframe #12: at::native::convolution(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x111 (0x7fdadf814bd1 in /usr/lib/libtorch_cpu.so)\nframe #13: <unknown function> + 0x2584c3e (0x7fdae0784c3e in /usr/lib/libtorch_cpu.so)\nframe #14: at::_ops::convolution::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x15d (0x7fdadff43b3d in /usr/lib/libtorch_cpu.so)\nframe #15: <unknown function> + 0x43b0226 (0x7fdae25b0226 in /usr/lib/libtorch_cpu.so)\nframe #16: <unknown function> + 0x43b1060 (0x7fdae25b1060 in /usr/lib/libtorch_cpu.so)\nframe #17: at::_ops::convolution::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, bool, c10::ArrayRef<long>, long) + 0x247 (0x7fdadff95a27 in /usr/lib/libtorch_cpu.so)\nframe #18: at::native::conv2d(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x20d (0x7fdadf81908d in /usr/lib/libtorch_cpu.so)\nframe #19: <unknown function> + 0x273d3f6 (0x7fdae093d3f6 in /usr/lib/libtorch_cpu.so)\nframe #20: at::_ops::conv2d::call(at::Tensor const&, at::Tensor const&, c10::optional<at::Tensor> const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long) + 0x202 (0x7fdae053fad2 in /usr/lib/libtorch_cpu.so)\nframe #21: <unknown function> + 0x2f359e (0x55ad31d2559e in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #22: <unknown function> + 0x2fe5da (0x55ad31d305da in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #23: <unknown function> + 0x2b79bd (0x55ad31ce99bd in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #24: <unknown function> + 0x2bd561 (0x55ad31cef561 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #25: <unknown function> + 0x2e1ad0 (0x55ad31d13ad0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #26: <unknown function> + 0x2c05f1 (0x55ad31cf25f1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #27: <unknown function> + 0xd90f5 (0x55ad31b0b0f5 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #28: <unknown function> + 0x96438 (0x55ad31ac8438 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #29: <unknown function> + 0x975a1 (0x55ad31ac95a1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #30: <unknown function> + 0xb496b (0x55ad31ae696b in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #31: <unknown function> + 0xa10ae (0x55ad31ad30ae in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #32: <unknown function> + 0xacbf1 (0x55ad31adebf1 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #33: <unknown function> + 0x621aee (0x55ad32053aee in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #34: <unknown function> + 0xacbc0 (0x55ad31adebc0 in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #35: <unknown function> + 0x9b83c (0x55ad31acd83c in $HOME.cargo-target/debug/examples/stable-diffusion)\nframe #36: <unknown function> + 0x23290 (0x7fdade03c290 in /usr/lib/libc.so.6)\nframe #37: __libc_start_main + 0x8a (0x7fdade03c34a in /usr/lib/libc.so.6)\nframe #38: <unknown function> + 0x91905 (0x55ad31ac3905 in $HOME.cargo-target/debug/examples/stable-diffusion)\n")', $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:72
stack backtrace:
0: rust_begin_unwind
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
2: core::result::unwrap_failed
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1785:5
3: core::result::Result<T,E>::unwrap
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/result.rs:1078:23
4: tch::wrappers::tensor_generated::<impl tch::wrappers::tensor::Tensor>::conv2d
at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/wrappers/tensor_generated.rs:6457:9
5: <tch::nn::conv::Conv<[i64; 2]> as tch::nn::module::Module>::forward
at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/conv.rs:216:9
6: tch::nn::module::<impl tch::wrappers::tensor::Tensor>::apply
at $HOME.cargo/registry/src/github.com-1ecc6299db9ec823/tch-0.9.0/src/nn/module.rs:47:9
7: diffusers::models::unet_2d::UNet2DConditionModel::forward
at ./src/models/unet_2d.rs:237:18
8: stable_diffusion::run
at ./examples/stable-diffusion/main.rs:167:30
9: stable_diffusion::main
at ./examples/stable-diffusion/main.rs:200:9
10: core::ops::function::FnOnce::call_once
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Not sure if this is regarding the library or some other issue.
Dumb question. Do you have a CUDA enabled GPU on your system?
oh yeah i forgot to give details about the machine.
% lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] (rev a1)
05:00.0 VGA compatible controller: Advanced Micro Devices ....
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 32C P3 N/A / N/A | 5MiB / 6144MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 666 G /usr/lib/Xorg 4MiB |
+-----------------------------------------------------------------------------+
6Gb 3060
% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
using export TORCH_CUDA_VERSION=cu118
.
would any other details be required.
i am using f16 model
% sha256sum unet16.bin
5019a4fbb455dd9b75192afc3ecf8a8ec875e83812fd51029d2e19277edddebc unet16.bin
Could try something like
println!("Cuda available: {}", tch::Cuda::is_available());
println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());
To see if the tch library can see it.
Cuda available: true
Cudnn available: true
Cuda available: true
Cudnn available: true
i am not sure why it printed stuff twice though.
--- a/examples/stable-diffusion/main.rs
+++ b/examples/stable-diffusion/main.rs
@@ -196,6 +196,8 @@ fn run(args: Args) -> anyhow::Result<()> {
fn main() -> anyhow::Result<()> {
let args = Args::parse();
+ println!("Cuda available: {}", tch::Cuda::is_available());
+ println!("Cudnn available: {}", tch::Cuda::cudnn_is_available());
if !args.autocast {
run(args)
} else {
EDIT:
i found that the cuda code is also part of the code.
some solution i found was: pytorch/pytorch#16831 (comment)
tch::Cuda::cudnn_set_benchmark(false);
this did not help.
there is another issue with same stuff: tensorflow/tensorflow#6698 (comment) or https://stackoverflow.com/a/52634209
but i dont see any api which can be used to do the same in tch-rs. if you have any idea please let me know.
Do you have libtorch installed? I had this issue, then fixed it, then forgot exactly what fixed it 😑
The things I tried were: installing the cuda version of pytorch, installing cuda 11.8, installing cudnn 8.9.1, and installing libtorch. After libtorch, it just worked magically.