DeepRec-AI/DeepRec

[BUILD] build failed with GPU configuration

cyberkillor opened this issue · 1 comments

System information

  • OS CPU: AMD EPYC 7V12 64-Core Processor
  • Build image: alideeprec/deeprec-build:deeprec-dev-gpu-py38-cu116-ubuntu20.04, and use nvidia-docker
  • OS Platform and Distribution (e.g., Linux Ubuntu 20.04): CentOS Linux release 7.9.2009 (Core)
  • DeepRec version or commit id: 29ecde4
  • Python version: 3.8.10
  • Bazel version (if compiling from source): 5.3.1 (build from source)
  • GCC/Compiler version (if compiling from source): 9.4
  • CUDA/cuDNN version: 11.6
  • GPU: Tesla T4
  • GPU Driver version: 470.161.03

.tf_configure.bazelrc:

build --python_path="/usr/bin/python"  # python 3.8.10
build:xla --define with_xla_support=true
build --config=xla
build:star --define with_star_support=true
build --config=star
build:pmem --define with_pmem_support=true
build:parquet_dataset --define with_parquet_dataset_support=true
build --config=parquet_dataset
build:api_compatible --define with_api_compatible=true
build --action_env TF_USE_CCACHE="0"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="7.5"
build --action_env LD_LIBRARY_PATH="/usr/local/cuda/compat:/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
build --config=cuda
build:opt --copt=-march=native
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true
build:v2 --define=tf_api_version=2
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_tag_filters=-benchmark-test,-no_oss,-oss_serial
test --build_tag_filters=-benchmark-test,-no_oss
test --test_tag_filters=-gpu
test --build_tag_filters=-gpu
build --action_env TF_CONFIGURE_IOS="0"

build --config=noaws
build --config=nogcp
build --config=noignite
build --config=nokafka
build --config=numa

Describe the problem

build with cmd bazel build -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package, show error:

Xnip2023-10-18_21-55-13

I found std::__cxx11::basic_string, so I try to build with bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package, show error:

Xnip2023-10-18_21-13-51

But if I annotate these lines:

#build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda"
#build --action_env TF_CUDA_COMPUTE_CAPABILITIES="7.5"
#build --action_env LD_LIBRARY_PATH="/usr/local/cuda/compat:/usr/local/nvidia/lib:/usr/local/nvidia/lib64"
#build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
#build --config=cuda

and build cpu version with bazel build -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package or bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package. It can compile.

fixed by bazel build --config=monolithic --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package (add --config=monolithic)