CHIP-SPV/chipStar

Undefined reference to __hip_fatbin during the linking of object files with reallocatable device code

Opened this issue · 3 comments

During the linking of object files that were compiled with reallocatable_device_code set to true, undefined reference to __hip_fatbin error was produced. -rdc=true (or --relocatable-device-code=true) flag needs to be explicitly added during linking in order to link successfully, but CUDA's nvcc doesn't require it (it links successfully without the -rdc=true flag). Please see the below reproducer for more details:

[Reproducer]
Have the following three files, main.cu, function.hpp, and function.cu:

main.cu

#include "function.hpp"

int main() {
  test();
}

function.hpp

void test();

function.cu

#include <cstdio>
#include "function.hpp"

void test() {
  printf("hello\n");
}

Compile the main.cu and function.cu with the following commands (these should compile fine with the newly added -dc flag support):
nvcc -dc -o main.o main.cu
nvcc -dc -o function.o function.cu

Then, link the object files:
nvcc -o main main.o function.o

The following error should appear:

/usr/bin/ld: main.o:(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
/usr/bin/ld: function.o:(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

failed to execute:/home/tsaini.chen/install/llvm/17.0/bin/clang++ -include /home/tsaini.chen/install/chipStar/add_flag/include/hip/spirv_fixups.h -I//home/tsaini.chen/install/chipStar/add_flag/include -isystem /home/tsaini.chen/install/chipStar/add_flag/include/cuspv -include /home/tsaini.chen/install/chipStar/add_flag/include/cuspv/cuda_runtime.h -D__NVCC__ -D__CHIP_CUDA_COMPATIBILITY__ main.o function.o -o main -L/home/tsaini.chen/install/chipStar/add_flag/lib -lCHIP -no-hip-rt -Wl,-rpath,/home/tsaini.chen/install/chipStar/add_flag/lib
test1: # /usr/bin/ld: main.o:(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -dc main.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -c function.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc main.o function.o -o driver

test2: #/usr/bin/ld: function.o:(.hipFatBinSegment+0x8): undefined reference to `__hip_fatbin'
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -c main.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -dc function.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc main.o function.o -o driver	

# InvalidInstruction: Can't translate llvm instruction:
#  Global variable cannot have Function storage class. Consider setting a proper address space.
#  Original LLVM value:
# @__clang_gpu_used_external = internal global [1 x ptr] [ptr @_Z4testv]
test3:
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -dc main.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -dc function.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -fgpu-rdc main.o function.o -o driver	

#InvalidTargetTriple: Expects spir-unknown-unknown or spir64-unknown-unknown. Actual target triple is <BLANK?>
test4:
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -c main.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -c function.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -fgpu-rdc main.o function.o -o driver	
	
# correct
test5:
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -c main.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -dc function.cu
	/space/pvelesko/chipStar/fix-893/build/bin/cucc -fgpu-rdc main.o function.o -o driver	

clean:
	rm -f *.o driver

@jjennychen Can you please change the makefile as test5 for now as a workaround?

@pjaaskel @Kerilk

What do you propose we do here?