device_function
: A device function in HIP/Terra, called from a HIP device kernel/host codedevice_kernel
: A device kernel in HIP/Terra, via__hipRegisterFatBinary
, called from a HIP host codedevice_kernel_module
: A device kernel in HIP/Terra, viahipModuleLoadData
, called from a HIP host codelocal_memory
: A demonstration of local memory, atomics, and barriers, via a simple histogram code
device_function
: Workingdevice_kernel
: Works with the following workarounds:-
Need to manually modify the host code fatbin to reference device code via:
@__hip_fatbin = external constant i8, section ".hip_fatbin"
(And then modify the fatbin wrapper to use this instead of embedding a string.)
-
device_kernel_module
: Workinglocal_memory
: Working
source crusher_env.sh
./build.sh
make -C device_function
make -C device_kernel_module
salloc -N 1 -A $PROJECT_ID -t 01:00:00 -p batch
srun device_function/saxpy_hip
srun device_function/saxpy_terra
srun device_kernel_module/saxpy_hip
srun device_kernel_module/saxpy_terra
- Target triples: https://llvm.org/docs/AMDGPUUsage.html#target-triples
- Processors (look for
gfx90a
): https://llvm.org/docs/AMDGPUUsage.html#processors - Clang offload bundler: https://clang.llvm.org/docs/ClangOffloadBundler.html
__hipRegisterFatBinary
docs- For comparison, NVIDIA's fatbin format (note the magic number)
- module API example code
- note the use of
--genco
to generate this output file
- note the use of
- Logging levels
The HIP .o
file seems to have been produced by a tool called
clang-offload-bundler
:
$ clang-offload-bundler --list --inputs=test_hip.o --type=o
hip-amdgcn-amd-amdhsa-gfx90a
host-x86_64-unknown-linux-gnu
You can use it to unpack the bundle too. Note that the device file is
LLVM bitcode, while the host file is object code. You can compile with
the -emit-llvm
flag in order to have both be LLVM bitcode.
clang-offload-bundler --unbundle --inputs=test_hip.o --type=o --outputs=test_hip.unbundle_device.bc --targets=hip-amdgcn-amd-amdhsa-gfx90a
clang-offload-bundler --unbundle --inputs=test_hip.o --type=o --outputs=test_hip.unbundle_host.o --targets=host-x86_64-unknown-linux-gnu
If you do use bitcode, the llvm-dis
command is useful to conver this
back into textual LLVM IR.
llvm-dis test_hip.unbundle_device.bc
__hipRegisterFatBinary
- (for comparison,
hipModuleLoadData
) PlatformState::addFatBinary
- calls
statCO_.addFatBinary
- this seems to go through
hip::StatCO
- defined here
- calls
StatCO::addFatBinary
- calls
digestFatBinary
- defined here
- calls
programs->ExtractFatBinary
FatBinaryInfo
is defined here
- calls
FatBinaryInfo::ExtractFatBinary
- calls
CodeObject::ExtractCodeObjectFromFile
- also
CodeObject::ExtractCodeObjectFromMemory
CodeObject
defined here
- calls
CodeObject::ExtractCodeObjectFromFile
- calls
extractCodeObjectFromFatBinary
- defined here
- THIS SEEMS TO BE THE PLACE WHERE THEY PARSE THE CLANG OFFLOAD BUNDLER API
- calls
- (
PlatformState::loadModule
)