NVIDIA/go-dcgm

Question related to GPU device attributes

starry91 opened this issue · 3 comments

Hi, I am looking for a programmatic way to get the Streaming Multiprocessor (SM) count for T4/A100(mig enabled, disabled) cards. Is there an API in go-dcgm or go-nvml that I can use?

To get the number of SMs, you must use cuDeviceGetAttribute Cuda driver API with CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT attribute.

Neither dcgm nor nvml exposes this API. The closest info that you could get from NVML is the number of slices in the enabled MIG profile, but that would only give you the number of the GPCs that you would have to multiply by the number of SMs per GPC that depends on the GPU architecture (thus, not helpful).

@nikkon-dev Is there a plan to add this API in DCGM or NVML anytime soon? The API should support both T4 and A100 cards. To add more context, in DCGM we expose the the SM Activity field which is inclusive of the no. of SMs in the GPU/MIG while there is no way to get the actual number of SMs in the device. Hence, we cannot infer the SM Activity value as much as we would like to.

@nikkon-dev Any update on this?