[Issue]: Telsa M40 GPU reports CUBLAS_STATUS_NOT_SUPPORTED

Question

[Issue]: Telsa M40 GPU reports CUBLAS_STATUS_NOT_SUPPORTED

edward-kirk opened this issue 23 days ago · 3 comments

Issue Description

Getting the following error. Tesla M40 24gb
Python: version=3.10.15 platform=Linux
bin="/home/kirk/sdNext/venv/bin/python3"
venv="/home/kirk/sdNext/venv"
12:42:45-813967 INFO Version: app=sd.next updated=2024-11-02 hash=65ddc611
branch=master
url=https://github.com/vladmandic/automatic/tree/master ui=main
12:42:46-146439 INFO Platform: arch=x86_64 cpu=x86_64 system=Linux
release=6.8.0-48-generic python=3.10.15
12:42:46-147900 INFO Args: []
12:42:46-156748 INFO CUDA: nVidia toolkit detected
12:42:46-157737 INFO Install: package="onnxruntime-gpu" mode=pip
12:43:04-069380 INFO Install: package="torch==2.5.1+cu124 torchvision==0.20.1+cu124
--index-url https://download.pytorch.org/whl/cu124" mode=pip
12:44:36-079777 INFO Install: package="onnx" mode=pip
12:44:39-368315 INFO Install: package="onnxruntime" mode=pip
12:54:52-085814 INFO Base: class=StableDiffusionPipeline
12:54:52-500710 ERROR Prompt parser encode: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when
calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
12:54:52-525193 ERROR Processing: step=base args={'prompt': ['test'], 'negative_prompt':
[''], 'guidance_scale': 6, 'generator': [<torch._C.Generator
object at 0x7e3650e3c5f0>], 'callback_on_step_end': <function
diffusers_callback at 0x7e3697f4cf70>,
'callback_on_step_end_tensor_inputs': ['latents', 'prompt_embeds',
'negative_prompt_embeds'], 'num_inference_steps': 20, 'eta': 1.0,
'guidance_rescale': 0.7, 'output_type': 'latent', 'width': 1024,
'height': 1024} CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when
calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
12:54:52-529346 ERROR Processing: RuntimeError

Version Platform Description

No response

Relevant log output

No response

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

I have read the above and searched for existing issues
I confirm that this is classified correctly and its not an extension issue

Answer 1 · 2024-11-02T18:05:11.000Z

try forcing dtype in settings to fp16 instead of auto since this is really old gpu architecture.
if that doesn't work, you may need to search for version of torch that is working fine with M40, not much i can do about that.

Answer 2 · 2024-11-11T21:10:11.000Z

any updates?

Answer 3 · 2024-11-12T02:43:46.000Z

The fp16 did not help. I had trouble getting a pytorch version to work. I'm currently attempting to compile pytorch from source and a newer python version.