AliyunContainerService/gpushare-scheduler-extender

显存与真实情况不符

SakuraAxy opened this issue · 1 comments

a1a13f1e-a8be-4bad-8e14-ac4531d702be
94fbb8e3-c94e-4f63-9f50-c80841f89fec

插件日志:

I0216 10:38:19.632781 1 main.go:18] Start gpushare device plugin
I0216 10:38:19.632882 1 gpumanager.go:28] Loading NVML
I0216 10:38:19.637532 1 gpumanager.go:37] Fetching devices.
I0216 10:38:19.637566 1 gpumanager.go:43] Starting FS watcher.
I0216 10:38:19.637648 1 gpumanager.go:51] Starting OS watcher.
I0216 10:38:19.644471 1 nvidia.go:64] Deivce GPU-639a39d4-883f-c4a3-a85c-88b51bf64612's Path is /dev/nvidia0
I0216 10:38:19.644538 1 nvidia.go:69] # device Memory: 12282
I0216 10:38:19.644547 1 nvidia.go:40] set gpu memory: 11
I0216 10:38:19.644554 1 nvidia.go:76] # Add first device ID: GPU-639a39d4-883f-c4a3-a85c-88b51bf64612--0
I0216 10:38:19.644572 1 nvidia.go:79] # Add last device ID: GPU-639a39d4-883f-c4a3-a85c-88b51bf64612--10
I0216 10:38:19.651635 1 nvidia.go:64] Deivce GPU-0c67a35e-3f02-4b3c-0388-445f1911a816's Path is /dev/nvidia1
I0216 10:38:19.651674 1 nvidia.go:69] # device Memory: 16376
I0216 10:38:19.651681 1 nvidia.go:76] # Add first device ID: GPU-0c67a35e-3f02-4b3c-0388-445f1911a816--0
I0216 10:38:19.651709 1 nvidia.go:79] # Add last device ID: GPU-0c67a35e-3f02-4b3c-0388-445f1911a816--10
I0216 10:38:19.651717 1 server.go:43] Device Map: map[GPU-639a39d4-883f-c4a3-a85c-88b51bf64612:0 GPU-0c67a35e-3f02-4b3c-0388-445f1911a816:1]
I0216 10:38:19.651740 1 server.go:44] Device List: [GPU-639a39d4-883f-c4a3-a85c-88b51bf64612 GPU-0c67a35e-3f02-4b3c-0388-445f1911a816]
I0216 10:38:19.678024 1 podmanager.go:68] No need to update Capacity aliyun.com/gpu-count
I0216 10:38:19.678528 1 server.go:222] Starting to serve on /var/lib/kubelet/device-plugins/aliyungpushare.sock
I0216 10:38:19.680665 1 server.go:230] Registered device plugin with Kubelet

Kubernetes Version: K3S v1.25.3+k3s1
Version:0.1.0

@cheyang can you help me ? Thanks !