tkestack/gpu-manager

What the fields means in metrics api /usage ? 获取指标服务的字段都是什么意思,有文档吗请问

Opened this issue · 4 comments

{"usage":{"0e8c1dc8-2137-404d-8b3a-f3aa12838448":{"stat":{"executor":{"dev":[{"id":"00000000:41:00.0","card_idx":"0","mem":1765,"pids":[1094],"device_mem":32510}]}},"project":"task-16605-8b22c7909a7f3afb-namespace","spec":{"executor":{"gpu":0.3,"mem":7936},"isoland-agent":{}}},"969f3df1-5f0b-4ccb-867c-029bc40c80a3":{"stat":{"executor":{"dev":[{"id":"00000000:41:00.0","card_idx":"0","mem":1765,"pids":[22349],"device_mem":32510}]}},"project":"task-16585-d4dbffe6c4b3adba-namespace","spec":{"executor":{"gpu":0.3,"mem":7936},"isoland-agent":{}}},"9de2edcc-0b9b-4cd5-a9fa-61eaeac189a0":{"stat":{"executor":{"dev":[{"id":"00000000:41:00.0","card_idx":"0","device_mem":32510}]}},"project":"task-16626-b19ea904b9ba9b88-namespace","spec":{"executor":{"gpu":0.3,"mem":7936},"isoland-agent":{}}}}}

这是我获取到的指标,但是有些字段意思比较疑惑。我目前只有一个机器,一个卡

what does field id mean ? id字段是什么意思呢,相同代表的是 在一台物理机上吗

@mikev4 cgroup挂载到容器中,读取cgroup.procs,发现文件不同步。请告知下你的环境k8s版本是什么,有没有发现cgroup.procs文件不同步的问题

mikev4 commented

@mikev4 cgroup挂载到容器中,读取cgroup.procs,发现文件不同步。请告知下你的环境k8s版本是什么,有没有发现cgroup.procs文件不同步的问题

k8s v1.21.13 没有发现啥问题

k8s 1.19.2

The 'id' looks like the ‘busId’ of the graphics card device.
image