`zesMemoryGetState` only works under root user
notsyncing opened this issue · 4 comments
Hello, I'm trying the free global memory query described here: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/supported/sycl_ext_intel_device_info.md, which calls zesMemoryGetState
under the hood. But I found that it always returns the total memory (16225243136
bytes, which is all my VRAM) as free memory under non-root user (already added to video
and render
group) even if something has occupied some VRAM, while under root user it correctly returned the free memory (10125414400
bytes).
btw, xpu-smi
also always reports 0 MB
of used memory under non-root user, while reporting the correct 6632 MB
under root user.
Is this behavior by design or there was some bugs? Thanks!
environment:
Fedora Silverblue 39
linux kernel 6.6.13-200.fc39.x86_64
oneapi-basekit 2024.0
oneapi-level-zero 1.15.8-1.fc39.x86_64
Please file bug against the L0 backend which you're using. Intel one is here: https://github.com/intel/compute-runtime/
And list which GPU kernel module you're using (upstream i915, i915 backport, Xe), as access rights are arbitrated by your kernel, not by user-space driver.
Upstream kernel documentation does not mention memory info being root-only: https://docs.kernel.org/gpu/driver-uapi.html#c.drm_i915_query_memory_regions
But kernel requires PERFMON capability for accessing some of the metrics. I don't think it should be needed for memory, but you could try whether that's enough instead of needing full root.
Are you doing this testing directly on host, or within a container (in which case UID mapping could be a problem)?
Are you doing this testing directly on host, or within a container (in which case UID mapping could be a problem)?
This happens both on the host and in a container.
But kernel requires PERFMON capability for accessing some of the metrics. I don't think it should be needed for memory, but you could try whether that's enough instead of needing full root.
After setcap "cap_perfmon=ep" xpu-smi
, it can report memory info correctly under non-root user.
Upstream kernel documentation does not mention memory info being root-only: https://docs.kernel.org/gpu/driver-uapi.html#c.drm_i915_query_memory_regions
Interesting, I found this in the link you posted:
in struct drm_i915_memory_region_info
:
unallocated_size
Estimate of memory remaining
Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this (or if this is an older kernel) the value here will always equal the probed_size. Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_size).
It matches my observations perfectly. So it is actually by design. Thanks for your help!
Note on PERFMON
capability use in containers... While about any kernel version in supported distro versions is new enough to support it, some (enterprise) setups may still run so old Docker version that it does not have support for it, only for the older (and much wider) SYS_ADMIN
capability.