Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
Opened this issue · 1 comments
moophlo commented
Describe the bug
Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
To Reproduce
Steps to reproduce the behavior:
Just start the pod
Expected behavior
Expect the file to be there
Screenshots
If applicable, add screenshots to help explain your problem.
System (please complete the following information):
- OS version: Mint 22
- Kernel version:6.8.0-47-generic
- Device plugins version: intel/intel-gpu-plugin:0.31.0
- Hardware info: [e.g. SPR with QAT]
Additional context
I1015 17:50:35.074780 1 gpu_plugin_resource_manager.go:174] GPU device plugin resource manager enabled
W1015 17:50:40.075999 1 gpu_plugin_resource_manager.go:315] Failed to read pods from kubelet API: Get "https://192.168.10.15:10250/pods": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W1015 17:50:40.082039 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:55:40.327845 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:56:17.135634 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:17.135662 1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 17:56:19.431164 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:19.431252 1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 17:56:20.529522 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:20.529585 1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 17:56:22.831799 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:22.831862 1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:00:40.329390 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:05:26.528397 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:26.528495 1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 18:05:28.634033 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:28.724962 1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:29.229192 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:29.229208 1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 18:05:30.901631 1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:30.901664 1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:40.331411 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:10:40.332615 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:15:40.335025 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:20:40.337430 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:25:40.338627 1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
eero-t commented
Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
Depending on which GPU HW you have, and which kernel driver you use for it, this message is expected:
- Only discrete Intel GPUs include device local memory
- That info is provided through sysfs only with the the out-of-tree Intel DKMS driver (prelim uAPI), not with the GPU kernel driver in upstream kernel. See: