Transcoder fails with error: CUBLAS_STATUS_NOT_INITIALIZED

Question

Transcoder fails with error: CUBLAS_STATUS_NOT_INITIALIZED

raffian opened this issue 3 years ago · 2 comments

After updating NVIDIA driver to 510.06, CUDA is finally recognized.

(codeGen_env) usr1@mak:~/projects/CodeGen$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.6 (64-bit runtime)
**Is CUDA available: True**
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080
Nvidia driver version: 510.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.7.0
[pip3] torchaudio==0.7.0a0+ac17b64
[pip3] torchvision==0.8.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.0.3               h15472ef_9    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] libblas                   3.9.0            12_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            12_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            12_linux64_mkl    conda-forge
[conda] mkl                       2021.4.0           h06a4308_640
[conda] numpy                     1.19.5           py36hfc0c790_2    conda-forge
[conda] pytorch                   1.7.0           py3.6_cuda11.0.221_cudnn8.0.3_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.7.0                      py36    pytorch
[conda] torchvision               0.8.1                py36_cu110    pytorch
(codeGen_env) usr1@mak:~/projects/CodeGen$

Successfully compiled cuda extensions but had to comment out this section in apex to suppress a runtime error as per the output suggestion. This may (or may not) be the cause of CUBLAS_STATUS_NOT_INITIALIZED (see below).

apex - commented out

   if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):
        raise RuntimeError("Cuda extensions are being compiled with a version of Cuda that does " +
   ...

Cuda Ext Compilation

(codeGen_env) usr1@mak:~/projects/CodeGen/apex$ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/pip/_internal/commands/install.py:245: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
  cmdoptions.check_install_build_global(options)
Using pip 21.3.1 from /home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/pip (python 3.6)
Processing /home/usr1/projects/CodeGen/apex
  Running command python setup.py egg_info

  torch.__version__  = 1.7.0

  running egg_info
  creating /tmp/pip-pip-egg-info-_podlbbz/apex.egg-info
  writing /tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/dependency_links.txt

  writing top-level names to /tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/top_level.txt
  writing manifest file '/tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/SOURCES.txt'
  adding license file 'LICENSE'
  writing manifest file '/tmp/pip-pip-egg-info-_podlbbz/apex.egg-info/SOURCES.txt'
  /home/usr1/projects/CodeGen/apex/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
    warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
  Preparing metadata (setup.py) ... done
Skipping wheel build for apex, due to binaries being disabled for it.
Installing collected packages: apex
    Running command /home/usr1/anaconda3/envs/codeGen_env/bin/python3.6 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/usr1/projects/CodeGen/apex/setup.py'"'"'; __file__='"'"'/home/usr1/projects/CodeGen/apex/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-tezgj0d4/install-record.txt --single-version-externally-managed --compile --install-headers /home/usr1/anaconda3/envs/codeGen_env/include/python3.6m/apex

    torch.__version__  = 1.7.0

    /home/usr1/projects/CodeGen/apex/setup.py:67: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
      warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")

    Compiling cuda extensions with
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Sun_Jul_28_19:07:16_PDT_2019
    Cuda compilation tools, release 10.1, V10.1.243
    from /usr/bin

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/usr1/projects/CodeGen/apex/setup.py", line 159, in <module>
        check_cuda_torch_binary_vs_bare_metal(CUDA_HOME)
      File "/home/usr1/projects/CodeGen/apex/setup.py", line 103, in check_cuda_torch_binary_vs_bare_metal
        "https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  "
    RuntimeError: Cuda extensions are being compiled with a version of Cuda that does not match the version used to compile Pytorch binaries.  Pytorch binaries were compiled with Cuda 11.0.
    In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).

Translation
cpp --> Java
Source lang file is C (not cpp), about 600 lines of code like this (nothing fancy):

int DBG_change(int level_offset)
{
	int retval = 0;

	if ((debug_level += level_offset) < MSG_DEBUG_LEVEL_NONE)
	{
		debug_level = MSG_DEBUG_LEVEL_NONE;
		debug_status = 0;
	}

	if (! debug_file_ptr && debug_level >= MSG_DEBUG_LEVEL_NONE)
	{
		retval = DBG_setup((char *) NULL, (char *) NULL);
	}
	else if (debug_file_ptr && debug_level <= MSG_DEBUG_LEVEL_NONE)
		DBG_close();

	return (retval);
}

static void DBG_set_level(char *app_name)
{
	char *envptr;
	char debug_env[MAX_ENVVARLEN + 1];

	if ((app_name != NULL) && *app_name)
	{
		strncpy(debug_env, app_name, MAX_ENVVARLEN);
		strncat(debug_env, "_DEBUG_LEVEL", MAX_ENVVARLEN - strlen(debug_env));
	}
	else
	{
		strcpy(debug_env, "DEBUG_LEVEL");
	}

	envptr = (char *) getenv(debug_env);

	if (envptr == NULL)
		debug_level = MSG_DEBUG_LEVEL_NONE;
	else if (! strcmp(envptr, "TRUE")  || ! strcmp(envptr, "true"))
		debug_level = MSG_DEBUG_LEVEL_ON_MIN;
	else if (! strcmp(envptr, "MIN")  || ! strcmp(envptr, "min"))
		debug_level = MSG_DEBUG_LEVEL_ON_MIN;
	else if (! strcmp(envptr, "NORM")  || ! strcmp(envptr, "norm"))
		debug_level = MSG_DEBUG_LEVEL_ON_NORM;
	else if (! strcmp(envptr, "MAX")   || ! strcmp(envptr, "max"))
		debug_level = MSG_DEBUG_LEVEL_ON_MAX;
	else if (! strcmp(envptr, "FALSE") || ! strcmp(envptr, "false"))
		debug_level = MSG_DEBUG_LEVEL_NONE;
	else if (isdigit(envptr[0]))
	{
		if ((debug_level = (int) atoi(envptr)) <= MSG_DEBUG_LEVEL_NONE)
			debug_level = MSG_DEBUG_LEVEL_NONE;
	}
	else
	{
		debug_level = MSG_DEBUG_LEVEL_NONE;
	}
}

python -m codegen_sources.model.translate
Error: CUBLAS_STATUS_NOT_INITIALIZED

(codeGen_env) usr1@mak:~/projects/CodeGen$ python -m codegen_sources.model.translate --src_lang cpp --tgt_lang java --model_path TransCoder_model_1.pth --beam_size 10 < csrc.c
adding to path /home/usr1/projects/CodeGen
INFO - 12/13/21 16:50:08 - 0:00:05 - ============ Model Reloading
INFO - 12/13/21 16:50:08 - 0:00:05 - Reloading encoder from TransCoder_model_1.pth ...
WARNING - 12/13/21 16:50:13 - 0:00:09 - Lang cpp_sa matched to pretrained cpp_sa lang embedding.
WARNING - 12/13/21 16:50:13 - 0:00:09 - Lang java_sa matched to pretrained java_sa lang embedding.
WARNING - 12/13/21 16:50:13 - 0:00:09 - Lang python_sa matched to pretrained python_sa lang embedding.
WARNING - 12/13/21 16:50:13 - 0:00:09 - The size of position embeddings in current model is 2048, the size of reloaded is 1024. need to repeat last positions 1024 times.
INFO - 12/13/21 16:50:13 - 0:00:10 - Reloading decoders from TransCoder_model_1.pth ...
WARNING - 12/13/21 16:50:14 - 0:00:11 - Lang cpp_sa matched to pretrained cpp_sa lang embedding.
WARNING - 12/13/21 16:50:14 - 0:00:11 - Lang java_sa matched to pretrained java_sa lang embedding.
WARNING - 12/13/21 16:50:14 - 0:00:11 - Lang python_sa matched to pretrained python_sa lang embedding.
WARNING - 12/13/21 16:50:14 - 0:00:11 - The size of position embeddings in current model is 2048, the size of reloaded is 1024. need to repeat last positions 1024 times.
INFO - 12/13/21 16:50:14 - 0:00:11 - Number of parameters (encoder): 143239641
INFO - 12/13/21 16:50:14 - 0:00:11 - Number of parameters (decoders): 168442329
INFO - 12/13/21 16:50:14 - 0:00:11 - Number of decoders: 1
...
/opt/conda/conda-bld/pytorch_1603729128610/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [158,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1603729128610/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [158,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1603729128610/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [158,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1603729128610/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [158,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/usr1/projects/CodeGen/codegen_sources/model/translate.py", line 276, in <module>
    beam_size=params.beam_size,
  File "/home/usr1/projects/CodeGen/codegen_sources/model/translate.py", line 192, in translate
    enc1 = self.encoder("fwd", x=x1, lengths=len1, langs=langs1, causal=False)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/usr1/projects/CodeGen/codegen_sources/model/src/model/transformer.py", line 433, in forward
    return self.fwd(**kwargs)
  File "/home/usr1/projects/CodeGen/codegen_sources/model/src/model/transformer.py", line 526, in fwd
    attn = self.attentions[i](tensor, attn_mask, use_cache=use_cache)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/usr1/projects/CodeGen/codegen_sources/model/src/model/transformer.py", line 243, in forward
    q = shape(self.q_lin(input))  # (bs, n_heads, qlen, dim_per_head)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/usr1/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

It's worth noting I haven't trained any models; just using TransCoder_model_1.pth as-is and doing a simple cpp->Java translation as a test. What is the significance of this error?

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

raffian commented 3 years ago

closed

Answer 1 · 2021-12-14T22:30:33.000Z

I managed to get beyond this issue. Turned out my input file had Windows line endings (can't believe I'm even admitting to that, but whatever, it happens) Also, my source files are C, not CPP, not sure if that was causing CUBLAS_STATUS_NOT_INITIALIZED but I intend to create a new processor for C.