openxla/xla

[xla:gpu] Build with GPU support fails with linker error

pxanthopoulos opened this issue · 7 comments

I am trying to build XLA from source following the instructions found below, with Docker & GPU support:

https://openxla.org/xla/build_from_source

More specifically, i cloned the XLA repo from a directory and executed the following commands:

  1. docker run --gpus all --name xla_gpu -w /xla -it -d --rm -v ./xla:/xla tensorflow/build:latest-python3.9 bash

(I added the --gpus all flag because the configure script failed as it could not find nvidia-smi.)

  1. docker exec -it xla_gpu bash

  2. ./configure.py --backend=CUDA with output:

INFO:root:Found path to clang at /usr/lib/llvm-17/bin/clang
INFO:root:Running echo __clang_major__ | /usr/lib/llvm-17/bin/clang -E -P -
INFO:root:/usr/lib/llvm-17/bin/clang reports major version 17.
INFO:root:Trying to find path to nvidia-smi...
INFO:root:Found path to nvidia-smi at /usr/bin/nvidia-smi
INFO:root:Found CUDA compute capabilities: ['7.0', '8.0']
INFO:root:Some CUDA config versions and paths were not provided, so trying to find them using find_cuda_config.py
INFO:root:Writing bazelrc to /xla/xla_configure.bazelrc...
  1. bazel build --test_output=all --spawn_strategy=sandboxed //xla/...

This step failed with the following error message:

ERROR: /xla/xla/tsl/cuda/BUILD.bazel:277:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not declared in package '' defined by /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/local_config_nccl/BUILD (Tip: use `query "@local_config_nccl//:*"` to see all the targets in that package) and referenced by '//xla/tsl/cuda:nccl_stub'
INFO: Repository double_conversion instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:506:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository com_google_benchmark instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:615:28: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:46:14: in _initialize_third_party
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/benchmark/workspace.bzl:9:20: in repo
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository nccl_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:402:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository cutlass_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:125:21: in workspace
  /xla/workspace2.bzl:46:20: in _tf_repositories
  /xla/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /xla/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository zlib instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:384:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository jsoncpp_git instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:376:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
INFO: Repository nvtx_archive instantiated at:
  /xla/WORKSPACE:19:15: in <toplevel>
  /xla/workspace2.bzl:111:19: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:622:21: in workspace
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/workspace2.bzl:412:20: in _tf_repositories
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/tsl/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Analysis of target '//xla/tsl/cuda:nccl_stub' failed; build aborted: Analysis failed

I overcame this error by editing the file /root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/local_config_nccl/BUILD referenced at the error message. I added the following to the end of this file:

alias( name = "nccl_headers", actual = "@nccl_archive//:nccl_headers", visibility = ["//visibility:public"], )

Then, I reran the 4th step (the build command). After building ~39000 of the ~45000 targets, it then failed with the following error message:

ERROR: /xla/xla/tools/BUILD:124:14: Linking xla/tools/convert_computation failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tools:convert_computation) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tools/convert_computation-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD0Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::GetExecutor(stream_executor::StreamExecutorConfig const&)':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x1d): undefined reference to `stream_executor::ExecutorCache::Get(stream_executor::StreamExecutorConfig const&)'
/usr/bin/ld: cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x49): undefined reference to `stream_executor::ExecutorCache::GetOrCreate(stream_executor::StreamExecutorConfig const&, std::function<absl::lts_20230802::StatusOr<std::unique_ptr<stream_executor::StreamExecutor, std::default_delete<stream_executor::StreamExecutor> > > ()> const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `_GLOBAL__sub_I_cuda_platform.cc':
cuda_platform.cc:(.text.startup+0x6b): undefined reference to `stream_executor::ExecutorCache::ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::GetKernel(stream_executor::MultiKernelLoaderSpec const&, stream_executor::Kernel*)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6c2): undefined reference to `stream_executor::KernelMetadata::set_registers_per_thread(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6f9): undefined reference to `stream_executor::KernelMetadata::set_shared_memory_bytes(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x727): undefined reference to `stream_executor::Kernel::set_name(std::basic_string_view<char, std::char_traits<char> >)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::VlogOccupancyInfo(stream_executor::DeviceDescription const&, stream_executor::Kernel const&, stream_executor::ThreadDim const&, stream_executor::BlockDim const&)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x65): undefined reference to `stream_executor::KernelMetadata::registers_per_thread() const'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x70): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::If(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer2IfES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::IfElse(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer6IfElseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EESN_E3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::Case(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<int>, std::vector<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::allocator<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)> > >)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer4CaseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIiEESt6vectorISt8functionIFS2_PS7_EESaISO_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x150): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (stream_executor::CommandBuffer*, unsigned long), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_1>::_M_invoke(std::_Any_data const&, stream_executor::CommandBuffer*&&, unsigned long&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer3ForEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorEiNS3_12DeviceMemoryIiEESt8functionIFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xb6): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer3ForES9_PNS6_14StreamExecutorEiNS6_12DeviceMemoryIiEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x81): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o):gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer5WhileEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorENS3_12DeviceMemoryIbEESt8functionIFS2_SD_S5_EESI_IFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xff): more undefined references to `stream_executor::KernelMetadata::shared_memory_bytes() const' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ERROR: /xla/xla/tools/BUILD:53:14: Linking xla/tools/hex_floats_to_packed_literal failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tools:hex_floats_to_packed_literal) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tools/hex_floats_to_packed_literal-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD0Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::GetExecutor(stream_executor::StreamExecutorConfig const&)':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x1d): undefined reference to `stream_executor::ExecutorCache::Get(stream_executor::StreamExecutorConfig const&)'
/usr/bin/ld: cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x49): undefined reference to `stream_executor::ExecutorCache::GetOrCreate(stream_executor::StreamExecutorConfig const&, std::function<absl::lts_20230802::StatusOr<std::unique_ptr<stream_executor::StreamExecutor, std::default_delete<stream_executor::StreamExecutor> > > ()> const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `_GLOBAL__sub_I_cuda_platform.cc':
cuda_platform.cc:(.text.startup+0x6b): undefined reference to `stream_executor::ExecutorCache::ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::GetKernel(stream_executor::MultiKernelLoaderSpec const&, stream_executor::Kernel*)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6c2): undefined reference to `stream_executor::KernelMetadata::set_registers_per_thread(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6f9): undefined reference to `stream_executor::KernelMetadata::set_shared_memory_bytes(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x727): undefined reference to `stream_executor::Kernel::set_name(std::basic_string_view<char, std::char_traits<char> >)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::VlogOccupancyInfo(stream_executor::DeviceDescription const&, stream_executor::Kernel const&, stream_executor::ThreadDim const&, stream_executor::BlockDim const&)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x65): undefined reference to `stream_executor::KernelMetadata::registers_per_thread() const'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x70): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::If(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer2IfES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::IfElse(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer6IfElseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EESN_E3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::Case(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<int>, std::vector<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::allocator<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)> > >)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer4CaseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIiEESt6vectorISt8functionIFS2_PS7_EESaISO_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x150): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (stream_executor::CommandBuffer*, unsigned long), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_1>::_M_invoke(std::_Any_data const&, stream_executor::CommandBuffer*&&, unsigned long&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer3ForEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorEiNS3_12DeviceMemoryIiEESt8functionIFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xb6): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_ElEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer3ForES9_PNS6_14StreamExecutorEiNS6_12DeviceMemoryIiEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x81): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o):gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer5WhileEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_ElEEPNS3_14StreamExecutorENS3_12DeviceMemoryIbEESt8functionIFS2_SD_S5_EESI_IFS2_S5_EEE3$_1E9_M_invokeERKSt9_Any_dataOS5_Om+0xff): more undefined references to `stream_executor::KernelMetadata::shared_memory_bytes() const' follow
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 2349.613s, Critical Path: 522.55s
INFO: 39625 processes: 16490 internal, 1 local, 23134 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

I have met the same error with you, #10592, still awaiting a response.

Same problem...
If i had to guess I would say there's a dependency declaration missing somewhere... but bazel is black magic i dare not look at...

Something like

diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD
index 2212fb622..bea1e01b9 100644
--- a/xla/stream_executor/cuda/BUILD
+++ b/xla/stream_executor/cuda/BUILD
@@ -75,6 +75,8 @@ cuda_only_cc_library(
             "//xla/stream_executor",
             "//xla/stream_executor:platform_manager",
             "//xla/stream_executor:stream_executor_interface",
+            "//xla/stream_executor:executor_cache",
+            "//xla/stream_executor:kernel",
             "//xla/stream_executor/gpu:gpu_driver_header",
             "//xla/stream_executor/gpu:gpu_executor_header",
             "//xla/stream_executor/platform",
diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD
index f0843969d..348f89528 100644
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -153,6 +153,7 @@ gpu_only_cc_library(
         ":gpu_types_header",
         "//xla/stream_executor",
         "//xla/stream_executor:stream_executor_interface",
+        "//xla/stream_executor:kernel",
         "@com_google_absl//absl/container:flat_hash_map",
         "@com_google_absl//absl/container:inlined_vector",
         "@com_google_absl//absl/functional:any_invocable",

gets pretty far.
But eventually fails linking as well with:

ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)':
traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow

i.e. tsl/platform/default/* was not compiled?

@pxanthopoulos were you able to find a solution? I am facing the same error when trying to build xla from source for GPU:

Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /users/neeld2/xla/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=198
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /users/neeld2/xla/xla_configure.bazelrc:
  'build' options: --action_env CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --repo_env CC=/usr/lib/llvm-17/bin/clang --repo_env BAZEL_COMPILER=/usr/lib/llvm-17/bin/clang --config nvcc_clang --action_env CLANG_CUDA_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-12.3 --action_env TF_CUBLAS_VERSION=12.3.2 --action_env TF_CUDA_COMPUTE_CAPABILITIES=6.0 --action_env TF_CUDNN_VERSION=8 --repo_env TF_NEED_TENSORRT=0 --config nonccl --action_env LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:/usr/local/cuda-12.3/lib64 --action_env PYTHON_BIN_PATH=/usr/bin/python --python_path /usr/bin/python --copt -Wno-sign-compare --copt -Wno-error=unused-command-line-argument --copt -Wno-gnu-offsetof-extensions --build_tag_filters -no_oss --test_tag_filters -no_oss
INFO: Found applicable config definition build:short_logs in file /users/neeld2/xla/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /users/neeld2/xla/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:nvcc_clang in file /users/neeld2/xla/.bazelrc: --config=cuda --action_env=TF_CUDA_CLANG=1 --action_env=TF_NVCC_CLANG=1 --@local_config_cuda//:cuda_compiler=nvcc
INFO: Found applicable config definition build:cuda in file /users/neeld2/xla/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:nonccl in file /users/neeld2/xla/.bazelrc: --define=no_nccl_support=true
INFO: Found applicable config definition build:monolithic in file /users/neeld2/xla/.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false
INFO: Found applicable config definition build:linux in file /users/neeld2/xla/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /users/neeld2/xla/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:98:14: 
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
  export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
  --repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:109:10: Using hermetic Python 3.11
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/tsl/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'nvtx_archive' because it already exists.
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'jsoncpp_git' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
ERROR: /users/neeld2/xla/xla/tsl/cuda/BUILD.bazel:278:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not declared in package '' defined by /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/local_config_nccl/BUILD (Tip: use `query "@local_config_nccl//:*"` to see all the targets in that package) and referenced by '//xla/tsl/cuda:nccl_stub'
INFO: Repository boringssl instantiated at:
  /users/neeld2/xla/WORKSPACE:46:15: in <toplevel>
  /users/neeld2/xla/workspace2.bzl:135:21: in workspace
  /users/neeld2/xla/workspace2.bzl:64:20: in _tf_repositories
  /users/neeld2/xla/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
  /users/neeld2/xla/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Analysis of target '//xla/tsl/cuda:nccl_stub' failed; build aborted: Analysis failed
INFO: Elapsed time: 51.639s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (283 packages loaded, 18469 targets configured)
    currently loading: @upb//
    Fetching repository @pypi_lit; starting 11s
    Fetching repository @double_conversion; starting
    Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/boringssl/archive/c00d7ca810e93780bd0c8ee4eea28f4f2ea4bcdc.tar.gz; 11.5 MiB (27.8%)
    Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/double_conversion; Extracting v3.2.0.tar.gz
    Fetching repository @curl; starting
    Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/curl; Extracting curl-8.4.0.tar.gz
    Fetching repository @scip; Restarting.

I tried passing the --config monolithic option, but it didn't work.

@neeldani what's your configure step like? should look like ./configure.py --backend=CUDA --nccl

This worked, thank you!

Something like

diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD
index 2212fb622..bea1e01b9 100644
--- a/xla/stream_executor/cuda/BUILD
+++ b/xla/stream_executor/cuda/BUILD
@@ -75,6 +75,8 @@ cuda_only_cc_library(
             "//xla/stream_executor",
             "//xla/stream_executor:platform_manager",
             "//xla/stream_executor:stream_executor_interface",
+            "//xla/stream_executor:executor_cache",
+            "//xla/stream_executor:kernel",
             "//xla/stream_executor/gpu:gpu_driver_header",
             "//xla/stream_executor/gpu:gpu_executor_header",
             "//xla/stream_executor/platform",
diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD
index f0843969d..348f89528 100644
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -153,6 +153,7 @@ gpu_only_cc_library(
         ":gpu_types_header",
         "//xla/stream_executor",
         "//xla/stream_executor:stream_executor_interface",
+        "//xla/stream_executor:kernel",
         "@com_google_absl//absl/container:flat_hash_map",
         "@com_google_absl//absl/container:inlined_vector",
         "@com_google_absl//absl/functional:any_invocable",

gets pretty far. But eventually fails linking as well with:

ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)':
traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow

i.e. tsl/platform/default/* was not compiled?

So, the linker error how to resolve, I get the same error: undefined reference to `tsl::Env::Default()'