节点重启后,发现gpu显存超分了
zlingqu opened this issue · 0 comments
zlingqu commented
当我重启GPU节点后,又发布了几个服务,发现某些卡的gpu显存超分了,效果如下:
[root@jenkins app-deploy-platform]# kubectl-inspect-gpushare
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU4(Allocated/Total) GPU5(Allocated/Total) GPU6(Allocated/Total) GPU7(Allocated/Total) GPU Memory(GiB)
192.168.3.4 192.168.3.4 18/11 8/11 9/11 11/11 17/11 8/11 8/11 4/11 83/88
192.168.68.4 192.168.68.4 14/10 10/10 6/10 14/10 10/10 10/10 9/10 0/10 73/80
192.168.68.68 192.168.68.68 9/10 8/10 4/10 0/10 0/10 0/10 0/10 0/10 21/80
---------------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
177/248 (71%)
我想这是插件本身有些bug