TACJu/TransFG

How to fix the RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx

xuritian317 opened this issue · 3 comments

Thanks for your work and sharing your codes!

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port 89898 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run

When I train on two gpus(1080TI *2), it is current.
the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3

Warning:  multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback.  Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X):   0%|| 0/749 [00:00<?, ?it/s]Warning:  apex was installed without --cpp_ext.  Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X):   0%|| 0/749 [00:42<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 400, in <module>
    main()
  File "train.py", line 397, in main
    train(args, model)
  File "train.py", line 226, in train
    loss, logits = model(x, y)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/parallel/distributed.py", line 560, in forward
    result = self.module(*inputs, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/amp/_initialize.py", line 196, in new_fwd
    output = old_fwd(*applier(args, input_caster),
  File "/home/lirunze/xh/project/git/trans-fg_-i2-t/models/modeling.py", line 305, in forward
    part_logits = self.part_head(part_tokens[:, 0])
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

Could you analyze the problem about this? Thank you!

How did you solve this problem?

Because of high pytorch's version, please use the pytorch 1.7.1 or 1.5.1 given from author.