Mapping between HIP device ID and rocm_smi
al42and opened this issue · 1 comments
I have a HIP app that uses hipSetDevice
and related API to do its things. It might be run with ROCR_VISIBLE_DEVICES
or HIP_VISIBLE_DEVICES
set.
For a given HIP device, I want to query some info using rocm_smi_lib, e.g., rsmi_topo_get_numa_node_number
.
What is the recommended way to map between a HIP device and a ROCm SMI device index?
Manually looping over results of rsmi_dev_pci_id_get
for all devices and comparing with hipDeviceProp_t::pciBusID
and friends seems like a possible solution, but I wonder if there's an easier / official way.
What is the recommended way to map between a HIP device and a ROCm SMI device index?
Manually looping over results of rsmi_dev_pci_id_get for all devices and comparing with hipDeviceProp_t::pciBusID and friends seems like a possible solution, but I wonder if there's an easier / official way.
Comparing PCI IDs decent way. RVS (ROCm validation suite) does something similar - except uses hipDeviceProp_t::pciBusID
& hipDeviceProp_t::pciDeviceID
:
// get GPU device properties
hipDeviceProp_t props;
hipGetDeviceProperties(&props, hip_index);
uint16_t hip_dev_location_id =
((((uint16_t) (props.pciBusID)) << 8) | (((uint16_t)(props.pciDeviceID)) << 3));
See this patch in RVS ROCm 6.0 for an example. But you're right hipDeviceProp_t::pciBusID
is a great way to start. RVS just wants to validate these are actually they same full PCIe BDF (Bus Device Function). Just don't forget about the device and (sometimes could include) the function part too. Depending which part of the the physical device you are looking at.
Full PCIe path is BUS ID:DEVICE ID.Function.
Can double check in linux by doing
readlink -f /sys/class/drm/card*/device/
or using lspci
.
Hope this helps.