StanfordLegion/legion

Replicated Heap for the HIP module

Opened this issue · 6 comments

The hipHostRegister can not guarantee the same virtual address between host and gpu memory. However, the current implementation of ReplicatedHeap of the CUDA module relies on the UVA, so we can not easily copy paste the implementation from the CUDA module into the HIP module. This will affect the MultiAffineAccessor for now.

muraj commented

@eddy16112 please keep in mind that NVIDIA CUDA cannot guarantee the same CPU and GPU address either, it just happens to be the case for the majority of Linux systems. See CU_DEVICE_ATTRIBUTE_CAN_USE_HOST_POINTER_FOR_REGISTERED_MEM for more information. We should not assume the pointer passed to cuMemHostRegister is the same as the gpu one for portability reasons.

Yeah, I think we need to check if the addresses are identical, and if not, we will need to raise an error.

muraj commented

But you nor the user can control this, you could get intermittent errors. Why do these need to be the same address? Why not just do the translation before using the address on the gpu? You cannot assume the same address across non-local CPU processors, so can we not use the same logic between local GPU and CPU processors?

That will be a question for @streichler , AFAIK, without unified address, it will be much more complicated to implement the replicated heap.

See also my test results at #1646 (comment)

It appears that unified addresses are supported with ROCm 5.6.0 and above.

Is this resolved or is there more to do here?