Compilation of custom operations failing on TF 2.15/CUDA 12
Icemole opened this issue · 5 comments
Hi, the compilation of NativeLstm2.cc
is failing with TF 2.15/CUDA 12, and it hadn't failed with TF 2.13/CUDA 11. A colleague of mine is also having similar issues when compiling GetCtcFsaFastBwOp.cc
.
There are many errors that are being thrown out by the compiler, but most of them are rather "silly", like:
error: expected a ";"
error: function "Ndarray_get_n_total_elements" has already been defined
error: name followed by "::" must be a class or namespace name
This leads me to think that the nvcc
compiler might be doing weird stuff here, and as a consequence that the operationss don't work with CUDA 12 as they are. I was also told that TF might play a role here, so I also posted the TF versions. Could there be a redundant file? Maybe incompatible CUDA versions?
nvcc
version where the compilation works:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
nvcc
version where the compilation doesn't work:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Let me know if I can provide any further details. Thanks in advance!
Can you try to run on CPU only (export DISABLE_CUDA=1
)?
Can you try to run test_TFNativeOp.py?
Please find here the full output of the compilation on CUDA.
Answering your questions:
- A non-CUDA environment, CPU only works!
python3 -m pytest test_TFNativeOp.py
also works, but I'm not sure if I'm running the test with CUDA enabled (if that makes any difference for the test). Besides, there are some skipped tests as well as some warnings. I'm running these in a machine that has GPUs available. Please see the results below.
test_TFNativeOp.py ......................................................sssssss [100%]
============================================================================ warnings summary =============================================================================
../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/plugins/manager.py:418
/home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/plugins/manager.py:418: DeprecationWarning: pkg_resources is deprecated as an
API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/importer.py:12
/home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/nose/importer.py:12: DeprecationWarning: the imp module is deprecated in favour of
importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
from imp import find_module, load_module, acquire_lock, release_lock
../../../../../../../../../../../.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/numpy/__config__.py:155
/home/nbeneitez/.venvs/singularity/returnn_test_native_op/lib/python3.10/site-packages/numpy/__config__.py:155: UserWarning: Install `pyyaml` for better output
warnings.warn("Install `pyyaml` for better output", stacklevel=1)
tests/test_TFNativeOp.py::test_py_viterbi
/home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/tests/test_TFNativeOp.py:2224: RuntimeWarning: d
ivide by zero encountered in log
am_scores = numpy.log(am_scores) # in +log space
tests/test_TFNativeOp.py::test_fast_viterbi
/home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/tests/test_TFNativeOp.py:2277: RuntimeWarning: d
ivide by zero encountered in log
am_scores = numpy.log(am_scores) # in +log space
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========================================================== 54 passed, 7 skipped, 5 warnings in 193.13s (0:03:13) ==========================================================
Should the test_TFNativeOp.py
fail for me? As I said, I might be doing something wrong.
python3 -m pytest test_TFNativeOp.py
also works
I assume you tested that with export DISABLE_CUDA=1
, i.e. only for CPU? Can you also try with CUDA?
Note, the main error is error: name followed by "::" must be a class or namespace name
on perftools
:
/home/nbeneitez/work/returnn/native_op_issue/work/i6_core/tools/git/CloneGitRepositoryJob.nH5B7CKRCU89/output/repository/returnn/native_op.cpp(240): error: name followed by "::" must be a class or namespace name
perftools::gputools::DeviceMemory<T> AsDeviceMemory(const T* cuda_memory) {
^
I guess they moved/renamed that. I see in other TF code that it is se::DeviceMemory<T>
(or maybe tensorflow::se::DeviceMemory<T>
or stream_executor::DeviceMemory<T>
or so) now.
Similarly, in our static perftools::gputools::blas::Transpose get_transpose
, I think it is stream_executor::blas::Transpose
or so now.