[AMDGPU] Dynamic Shared/LDS
DiamondLovesYou opened this issue · 0 comments
DiamondLovesYou commented
Just found this:
__device__
inline
void* __get_dynamicgroupbaseptr()
{
// Get group segment base pointer.
return (char*)__local_to_generic((void*)__to_local(__llvm_amdgcn_groupstaticsize()));
}
In HIP's source.
llvm.amdgcn.groupstaticsize
returns an i32
, which as you'd expect returns the size of the statically allocated WG storage (ie via a global in the LLVM IR), with any dynamic LDS placed after (and thus we see the i32
-> i8*
cast).
As it is now, Geobacter/Rust prevents statics from being generic, which prevents e.g. GEMM kernels from being generic over the type of the matrix element (without severe performance degradation). But this Rust limitation can be avoided by allowing kernels to allocate some LDS dynamically. Geobacter's shared source will then ensure that the host and device sides match.
TODO:
- Add a type and memory safe way to allocate host side and use device side.