intel/intel-device-plugins-for-kubernetes

Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

Opened this issue · 1 comments

Describe the bug
Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

To Reproduce
Steps to reproduce the behavior:
Just start the pod

Expected behavior
Expect the file to be there

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS version: Mint 22
  • Kernel version:6.8.0-47-generic
  • Device plugins version: intel/intel-gpu-plugin:0.31.0
  • Hardware info: [e.g. SPR with QAT]

Additional context

I1015 17:50:35.074780       1 gpu_plugin_resource_manager.go:174] GPU device plugin resource manager enabled
W1015 17:50:40.075999       1 gpu_plugin_resource_manager.go:315] Failed to read pods from kubelet API: Get "https://192.168.10.15:10250/pods": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W1015 17:50:40.082039       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:55:40.327845       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:56:17.135634       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:17.135662       1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 17:56:19.431164       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:19.431252       1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 17:56:20.529522       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:20.529585       1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 17:56:22.831799       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:22.831862       1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:00:40.329390       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:05:26.528397       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:26.528495       1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 18:05:28.634033       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:28.724962       1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:29.229192       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:29.229208       1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 18:05:30.901631       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:30.901664       1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:40.331411       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:10:40.332615       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:15:40.335025       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:20:40.337430       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:25:40.338627       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

Depending on which GPU HW you have, and which kernel driver you use for it, this message is expected: