remicres/sr4rs

train.py CUDA_ERROR_NO_BINARY_FOR_GPU

Opened this issue · 13 comments

Hi @remicres,

So when running the training on the docker/otbtf/gpu:2.4, after successfully opening TensorFlow libraries, I receive this error:

2021-05-04 14:53:04.337410: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
Traceback (most recent call last):
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
         [[{{node Abs_2}}]]
         [[Mean_24/_343]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
         [[{{node Abs_2}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "sr4rs/code/train.py", line 307, in <module>
    tf.compat.v1.app.run(main)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/otbtf/lib/python3/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/opt/otbtf/lib/python3/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "sr4rs/code/train.py", line 286, in main
    _do(train_op, merged_losses_summaries, "training")
  File "sr4rs/code/train.py", line 271, in _do
    _, _summary = sess.run([_train_op, _summary_op])
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
         [[node Abs_2 (defined at sr4rs/code/train.py:135) ]]
         [[Mean_24/_343]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
         [[node Abs_2 (defined at sr4rs/code/train.py:135) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'Abs_2':
  File "sr4rs/code/train.py", line 307, in <module>
    tf.compat.v1.app.run(main)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/opt/otbtf/lib/python3/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/opt/otbtf/lib/python3/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "sr4rs/code/train.py", line 135, in main
    gen_loss_l1 = tf.add_n([tf.reduce_mean(tf.abs(hr_images_fake[factor] -
  File "sr4rs/code/train.py", line 135, in <listcomp>
    gen_loss_l1 = tf.add_n([tf.reduce_mean(tf.abs(hr_images_fake[factor] -
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/ops/math_ops.py", line 401, in abs
    return gen_math_ops._abs(x, name=name)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/ops/gen_math_ops.py", line 55, in _abs
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
    ret = Operation(
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/framework/ops.py", line 1990, in __init__
    self._traceback = tf_stack.extract_stack()

2021-05-04 14:53:04.661530: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
         [[{{node PyFunc}}]]

This is while running on RTX 3070 with CUDA 11.3.

LE: I believe this is due to different versions of CUDA between the host and docker image? 11.3 not being compatible with 11.0?

Hi @quizz0n ,
Can you tell me what is you OS and docker version? I know that to enable GPU with docker is different depending on the version.
How did you start the docker image?
I don't know this problem, looks cuda/docker related...

Yes that's probably right.
I'm using Ubuntu (WSL 2) on Windows 10 OS Build 21370. Docker version 20.10.2, build 20.10.2-0ubuntu1~20.04.2.

LE: This is how I started the docker image:
docker run -ti -u root --entrypoint=/bin/bash --gpus all --env NVIDIA_DISABLE_REQUIRE=1 registry.gitlab.com/latelescop/docker/otbtf/gpu:2.4

This is probably related to WSL2+GPU+CUDA.

I am currently trying to have bullet-proof guidelines to set-up OTBTF on windows with GPU, but I am not very familiar with Windows.

What you could try, is to rebuild the docker image on your computer.

I've tried to run this just now on a clean Ubuntu 20.04 install (real OS, not WSL2), but the error is the same. I'm not very familiar with rebuilding a docker image but I will look into it. Basically to create a new docker image based on this one but with different CUDA?

LE: The error message on Ubuntu 20.04 install:

2021-05-07 01:46:31.577211: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 01:46:32.524205: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-07 01:46:32.527454: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-05-07 01:46:32.527633: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!

Applying this fix: https://stackoverflow.com/questions/38303974/tensorflow-running-error-with-cublas I get:

2021-05-07 01:51:09.087035: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 01:51:10.332892: W tensorflow/stream_executor/gpu/asm_compiler.cc:235] Your CUDA software stack is old. We fallback to the NVIDIA driver for some compilation. Update your CUDA version to get the best performance. The ptxas error was: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

2021-05-07 01:51:10.332988: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Unimplemented: /usr/local/cuda-11.0/bin/ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2021-05-07 01:51:10.397086: W tensorflow/stream_executor/gpu/asm_compiler.cc:235] Your CUDA software stack is old. We fallback to the NVIDIA driver for some compilation. Update your CUDA version to get the best performance. The ptxas error was: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'
2021-05-07 01:51:13.225935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-07 01:51:13.298065: W tensorflow/stream_executor/gpu/asm_compiler.cc:235] Your CUDA software stack is old. We fallback to the NVIDIA driver for some compilation. Update your CUDA version to get the best performance. The ptxas error was: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'
Traceback (most recent call last):
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node Abs_2}}]]
	 [[Mean_24/_343]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node Abs_2}}]]
0 successful operations.
0 derived errors ignored.

Tried to replace the ptxas as here: tensorflow/tensorflow#45590 I get:

2021-05-07 02:10:25.756964: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 02:10:27.049464: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-07 02:10:27.439194: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-07 02:10:27.515972: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:88 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2021-05-07 02:10:28.161538: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:88 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2021-05-07 02:10:28.919060: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cwise_op_gpu_base.cc:88 : Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
2021-05-07 02:10:29.299647: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 891.18M (934473728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.300056: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 802.06M (841026304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.300491: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 721.86M (756923648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.300853: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 649.67M (681231360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.301297: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 584.71M (613108224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.301713: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 526.23M (551797504 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-05-07 02:10:29.302171: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 473.61M (496617728 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Traceback (most recent call last):
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/opt/otbtf/lib/python3/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node Abs_2}}]]
	 [[Mean_24/_343]]
  (1) Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device
	 [[{{node Abs_2}}]]
0 successful operations.
0 derived errors ignored.

I've tried to run this just now on a clean Ubuntu 20.04 install (real OS, not WSL2), but the error is the same. I'm not very familiar with rebuilding a docker image but I will look into it. Basically to create a new docker image based on this one but with different CUDA?

You should be able to build the docker image with a single command (see this). Maybe you will have to try different build options.

Managed to build a new docker image and successfully trained the network. However when running sr.py I get the fallowing error:

2021-05-11 13:41:27.083035: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 1886711 microseconds.
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Source info :
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Receptive field  : [160, 160]
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Placeholder name : lr_input
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Output spacing ratio: 0.25
2021-05-11 13:41:27 (INFO) TensorflowModelServe: The TensorFlow model is used in fully convolutional mode
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Output field of expression: [512, 512]
2021-05-11 13:41:27 (INFO) TensorflowModelServe: Tiling disabled
2021-05-11 13:41:27 (WARNING): Streaming configuration through extended filename is used. Any previous streaming configuration (ram value, streaming mode ...) will be ignored.
2021-05-11 13:41:27 (INFO): File Sentinel-2_B4328_0.5m.tif will be written in 110 blocks of 512x512 pixels
Writing Sentinel-2_B4328_0.5m.tif?&gdal:co:COMPRESS=DEFLATE&streaming:type=tiled&streaming:sizemode=height&streaming:sizevalue=512...: 0% [                                                  ]2021-05-11 13:41:27.770868: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-11 13:41:28.572215: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-11 13:41:28.573738: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2021-05-11 13:41:28.573882: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops.cc:1106 : Not found: No algorithm worked!
Traceback (most recent call last):
  File "sr4rs/code/sr.py", line 76, in <module>
    infer.ExecuteAndWriteOutput()
  File "/opt/otbtf/lib/otb/python/otbApplication.py", line 2321, in ExecuteAndWriteOutput
    return _otbApplication.Application_ExecuteAndWriteOutput(self)
RuntimeError: Exception thrown in otbApplication Application_ExecuteAndWriteOutput: /src/otb/otb/Modules/Remote/otbtf/include/otbTensorflowMultisourceModelBase.hxx:96:
itk::ERROR: TensorflowMultisourceModelFilter(0x27eb450): Can't run the tensorflow session !
Tensorflow error message:
Not found: 2 root error(s) found.
  (0) Not found: No algorithm worked!
	 [[{{node gen/encoder/conv1_9x9/Conv2D}}]]
	 [[output_64/_1075]]
  (1) Not found: No algorithm worked!
	 [[{{node gen/encoder/conv1_9x9/Conv2D}}]]
0 successful operations.
0 derived errors ignored.
OTB Filter debug message:
Output image buffered region: ImageRegion (0x7ffcdd412fc0)
  Dimension: 2
  Index: [0, 0]
  Size: [512, 512]

Input #0:
Requested region: ImageRegion (0x7ffcdd412ff0)
  Dimension: 2
  Index: [0, 0]
  Size: [160, 160]

Tensor shape ("lr_input": {1, 160, 160, 3}
User placeholders:

Looks like the error is from

2021-05-11 13:41:28.573738: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

Strange that you can train the network but not use it at inference time.
You tried with a SavedModel you created? or the pre-trained one?

Tried with a SavedModel I created.
I've seen 2021-05-11 13:41:28.573738: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED and tried to fix with:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

but that generates another error and that's why I wasn't sure that's the issue.

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:118] File already exists in database: tensorflow/core/profiler/profiler_service_monitor_result.proto
[libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1379] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
Traceback (most recent call last):
  File "sr4rs/code/sr.py", line 63, in <module>
    infer = otbApplication.Registry.CreateApplication("TensorflowModelServe")
  File "/opt/otbtf/lib/otb/python/otbApplication.py", line 3544, in CreateApplication
    application = _otbApplication.Registry_CreateApplicationWithoutLogger(name)
RuntimeError: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):

The last error reminds me this issue in OTBTF.
It happens when you try to import both otbApplication and tensorflow in the same python code. It is a current known limitation in OTBTF.

However the failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED looks really CUDA-related

The last error reminds me this issue in OTBTF.
It happens when you try to import both otbApplication and tensorflow in the same python code. It is a current known limitation in OTBTF.

Indeed that looks like its the issue as I'm importing tensorflow to fix failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED.

I think we can close this issue. The initial error Internal: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device is occurring when TF is not properly built for specific GPU. Rebuilding TF / new docker image solves the problem.

Thanks. Do you know which parameter(s) did you manage to change?

For docker build change the version of CUDA for BASE_IMG arg and for TF build env variables in build-env-tf.sh add/change for specific TF_CUDA_COMPUTE_CAPABILITIES.