some problem about auto allocate GPU card
guobingithub opened this issue · 1 comments
hello, my gpu server which has 4 gpu cards(every one has 7611MiB),
now three containers run on the card gpu0, they total used 7601MiB.
then i run a new container, as expect this new container will run on gpu1 or gpu2 or gpu3.
but it does not run on gpu1/gpu2/gpu3 at all!!! Actualy it run failed!(CrashLoopBackOff)!
root@server:~# root@server:~# kubectl get po NAME READY STATUS RESTARTS AGE binpack-1-5cb847f945-7dp5g 1/1 Running 0 3h33m binpack-2-7fb6b969f-s2fmh 1/1 Running 0 64m binpack-3-84d8979f89-d6929 1/1 Running 0 59m binpack-4-669844dd5f-q9wvm 0/1 **CrashLoopBackOff** 15 56m ngx-dep1-69c964c4b5-9d7cp 1/1 Running 0 102m root@server:~# root@server:~#
my gpu server info:
`root@server:~# nvidia-smi
Wed May 20 18:18:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:18:00.0 Off | 0 |
| N/A 65C P0 25W / 75W | 7601MiB / 7611MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P4 Off | 00000000:3B:00.0 Off | 0 |
| N/A 35C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 32C P8 6W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P4 Off | 00000000:86:00.0 Off | 0 |
| N/A 38C P8 7W / 75W | 0MiB / 7611MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 24689 C python 7227MiB |
| 0 45236 C python 151MiB |
| 0 47646 C python 213MiB |
+-----------------------------------------------------------------------------+
root@server:##`
root@server:
and my binpack-4.yaml info is below:
`root@server:/home/guobin/gpu-repo# cat binpack-4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: binpack-4
labels:
app: binpack-4
spec:
replicas: 1
selector: # define how the deployment finds the pods it manages
matchLabels:
app: binpack-4
template: # define the pods specifications
metadata:
labels:
app: binpack-4
spec:
containers:
- name: binpack-4
image: cheyang/gpu-player:v2
resources:
limits:
# MiB
aliyun.com/gpu-mem: 200`
as you can see, the aliyun.com/gpu-mem is 200MiB.
ok! these are all important info. Why this plugin can not auto allocate GPU card?
or is there something i need to modify?
Thanks for your help!
@cheyang can you give me a help ? thanks very much.