openxla/xla

Build from source fails

Opened this issue · 7 comments

Building from source, as documented, fails.

Both CPU and GPU versions.

CPU eventually dies with:

[20,435 / 22,255] Compiling xla/service/gpu/all_reduce_blueconnect_test.cc; 27s processwrapper-sandbox ... (20 actions, 19 running)
ERROR: /xla/xla/service/gpu/BUILD:1629:9: Linking xla/service/gpu/autotuner_compile_util_test_gpu failed: (Exit 1): clang failed: error executing command (from target //xla/service/gpu:autotuner_compile_util_test_gpu) /usr/lib/llvm-17/bin/clang @bazel-out/k8-opt/bin/xla/service/gpu/autotuner_compile_util_test_gpu-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
ld.lld: error: undefined symbol: main
>>> referenced by /lib/x86_64-linux-gnu/Scrt1.o:(_start)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 3012.752s, Critical Path: 191.92s
INFO: 20461 processes: 6106 internal, 1 local, 14354 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

GPU fails with

[33,575 / 44,041] Compiling xla/mlir_hlo/mhlo/IR/hlo_ops.cc [for tool]; 80s processwrapper-sandbox ... (20 actions, 19 running)
ERROR: /xla/xla/tools/BUILD:104:14: Linking xla/tools/show_literal failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tools:show_literal) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tools/show_literal-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD0Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::GetExecutor(stream_executor::StreamExecutorConfig const&)':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x1d): undefined reference to `stream_executor::ExecutorCache::Get(stream_executor::StreamExecutorConfig const&)'
/usr/bin/ld: cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatform11GetExecutorERKNS_20StreamExecutorConfigE+0x49): undefined reference to `stream_executor::ExecutorCache::GetOrCreate(stream_executor::StreamExecutorConfig const&, std::function<absl::lts_20230802::StatusOr<std::unique_ptr<stream_executor::StreamExecutor, std::default_delete<stream_executor::StreamExecutor> > > ()> const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform_cuda_only.lo(cuda_platform.o): in function `_GLOBAL__sub_I_cuda_platform.cc':
cuda_platform.cc:(.text.startup+0x6b): undefined reference to `stream_executor::ExecutorCache::ExecutorCache()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::GetKernel(stream_executor::MultiKernelLoaderSpec const&, stream_executor::Kernel*)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x6f2): undefined reference to `stream_executor::KernelMetadata::set_registers_per_thread(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x729): undefined reference to `stream_executor::KernelMetadata::set_shared_memory_bytes(int)'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor9GetKernelERKNS_21MultiKernelLoaderSpecEPNS_6KernelE+0x75e): undefined reference to `stream_executor::Kernel::set_name(std::basic_string_view<char, std::char_traits<char> >)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_executor_cuda_only.lo(cuda_executor.o): in function `stream_executor::gpu::GpuExecutor::VlogOccupancyInfo(stream_executor::DeviceDescription const&, stream_executor::Kernel const&, stream_executor::ThreadDim const&, stream_executor::BlockDim const&)':
cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x65): undefined reference to `stream_executor::KernelMetadata::registers_per_thread() const'
/usr/bin/ld: cuda_executor.cc:(.text._ZN15stream_executor3gpu11GpuExecutor17VlogOccupancyInfoERKNS_17DeviceDescriptionERKNS_6KernelERKNS_9ThreadDimERKNS_8BlockDimE+0x70): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, stream_executor::DeviceMemory<bool> >::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmNS_12DeviceMemoryIbEEEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmmmmmmmmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<unsigned long, stream_executor::DeviceMemory<int>, int>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJmNS_12DeviceMemoryIiEEiEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::TypedKernel<>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_command_buffer.cc:(.text._ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJEE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::If(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_EmEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer2IfES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::IfElse(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<bool>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_EmEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer6IfElseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIbEESt8functionIFS2_PS7_EESN_E3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x76): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::Case(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, stream_executor::StreamExecutor*, stream_executor::DeviceMemory<int>, std::vector<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>, std::allocator<std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)> > >)::$_0>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_EmEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer4CaseES9_PNS6_14StreamExecutorENS6_12DeviceMemoryIiEESt6vectorISt8functionIFS2_PS7_EESaISO_EEE3$_0E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x164): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (stream_executor::CommandBuffer*, unsigned long), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_0>::_M_invoke(std::_Any_data const&, stream_executor::CommandBuffer*&&, unsigned long&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer3ForEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_EmEEPNS3_14StreamExecutorEiNS3_12DeviceMemoryIiEESt8functionIFS2_S5_EEE3$_0E9_M_invokeERKSt9_Any_dataOS5_Om+0xb6): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `std::_Function_handler<absl::lts_20230802::Status (tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, absl::lts_20230802::Span<unsigned long const>), stream_executor::gpu::GpuCommandBuffer::For(tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>, stream_executor::StreamExecutor*, int, stream_executor::DeviceMemory<int>, std::function<absl::lts_20230802::Status (stream_executor::CommandBuffer*)>)::$_1>::_M_invoke(std::_Any_data const&, tsl::gtl::IntType<stream_executor::CommandBuffer::ExecutionScopeId_tag_, unsigned long>&&, absl::lts_20230802::Span<unsigned long const>&&)':
gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEN3tsl3gtl7IntTypeIN15stream_executor13CommandBuffer21ExecutionScopeId_tag_EmEENS1_4SpanIKmEEEZNS6_3gpu16GpuCommandBuffer3ForES9_PNS6_14StreamExecutorEiNS6_12DeviceMemoryIiEESt8functionIFS2_PS7_EEE3$_1E9_M_invokeERKSt9_Any_dataOS9_OSC_+0x81): undefined reference to `stream_executor::KernelMetadata::shared_memory_bytes() const'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o):gpu_command_buffer.cc:(.text._ZNSt17_Function_handlerIFN4absl12lts_202308026StatusEPN15stream_executor13CommandBufferEmEZNS3_3gpu16GpuCommandBuffer5WhileEN3tsl3gtl7IntTypeINS4_21ExecutionScopeId_tag_EmEEPNS3_14StreamExecutorENS3_12DeviceMemoryIbEESt8functionIFS2_SD_S5_EESI_IFS2_S5_EEE3$_0E9_M_invokeERKSt9_Any_dataOS5_Om+0xff): more undefined references to `stream_executor::KernelMetadata::shared_memory_bytes() const' follow
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_timer_gpu_only.a(gpu_timer.o): in function `stream_executor::TypedKernel<stream_executor::DeviceMemory<stream_executor::gpu::GpuSemaphoreState>, stream_executor::gpu::GpuSemaphoreState>::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)':
gpu_timer.cc:(.text._ZN15stream_executor11TypedKernelIJNS_12DeviceMemoryINS_3gpu17GpuSemaphoreStateEEES3_EE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE[_ZN15stream_executor11TypedKernelIJNS_12DeviceMemoryINS_3gpu17GpuSemaphoreStateEEES3_EE6CreateEPNS_14StreamExecutorERKNS_21MultiKernelLoaderSpecE]+0x11): undefined reference to `stream_executor::Kernel::Create(stream_executor::StreamExecutor*, stream_executor::MultiKernelLoaderSpec const&)'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
INFO: Elapsed time: 2898.757s, Critical Path: 170.72s
INFO: 33598 processes: 16165 internal, 1 local, 17432 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

...or other variants of "symbol not found" during linking.

Also see #10592 and #10616 for GPU-build failure.

Tried an earlier version 12eee88, the one included with jax 0.4.24 and that does compile (for cpu).

Looks like something along the way broke?

May I suggest to update the docs with:

docker run --name xla --workdir $PWD -it --rm --detach --volume $PWD:$PWD tensorflow/build:latest-python3.9 bash

so that the source directory tree inside the container matches the outside.

Combine this with:

docker exec xla ./configure.py --backend=...
docker exec xla bazel --output_user_root=bazel-build build //xla/...  --spawn_strategy=sandboxed --test_output=all

because Bazel creates a jungle of symlinks with absolute paths. With this change, the symlinks match the paths on the outside of the container, and all the targets are inside bazel-build (which is git-ignored already), so accessible from the outside.

Bazel's symlink behaviour is not very intuitive for someone not familiar with it. End up wasting a few hours/days to recompile XLA over and over again to get to its output.

Tried an earlier version 12eee88, the one included with jax 0.4.24 and that does compile (for cpu).

Fails to build for GPU with:

/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_platform.lo(cuda_platform.o): in function `stream_executor::gpu::CudaPlatform::~CudaPlatform()':
cuda_platform.cc:(.text._ZN15stream_executor3gpu12CudaPlatformD2Ev+0x18): undefined reference to `stream_executor::ExecutorCache::~ExecutorCache()'

and more of those.

Looks like the magic parameter might be --config=monolithic

@jtotzid do you know of a hash to roll back to get something that builds for CUDA?

@jtotzid do you know of a hash to roll back to get something that builds for CUDA?

The builds I've successfully run for CUDA were top of tree, or commits that correspond to a particular JAX version.

There's a magic parameter that makes it work: --config=monolithic. Just append that to your bazel cmd line:

bazel --output_user_root=$PWD/bazel-build build //xla/... --spawn_strategy=sandboxed --test_output=all --config=monolithic