CentaurusInfra/alnair

Kubeshare test cases:

Opened this issue · 8 comments

environment:
10.175.20.128, 8 ti2080
cases: 1. created
8 sharepods with (the pods have 'infinity sleep'
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
results:
working pods is not scheduled. log in the scheduler: 'No enough resources for SharePod: default/pod1\n'

environment:
10.175.20.128, 8 ti2080
cases: 2. created
2 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
results:
sharpods pods are not all completed properly.

environment:
10.175.20.128, 8 ti2080
cases: 3. created
2 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue1" # optional
results:
sharpods pods work properly. working pods are not created.

environment:
10.175.20.128, 8 ti2080
cases: 4. created
7 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue1" # optional
results:
all sharepods are fine. the two working pods operates in sequential mode.

environment:
10.175.20.128, 8 ti2080
cases: 5. created
7 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green1" # optional
"kubeshare/sched_exclusion": "blue1" # optional
results:
all sharepods are fine. the two working pods operates in sequential mode.

environment:
10.175.20.128, 8 ti2080
cases: 6. created
7 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green1" # optional
"kubeshare/sched_exclusion": "blue" # optional
results:
pod2:
Started: Mon, 24 Jan 2022 13:51:58 -0800
Finished: Mon, 24 Jan 2022 13:52:03 -0800
pod1:
Started: Mon, 24 Jan 2022 13:52:09 -0800
Finished: Mon, 24 Jan 2022 13:52:14 -0800

environment:
10.175.20.128, 8 ti2080
cases: 6. created
7 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
1 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green1" # optional
results: "kubeshare/sched_exclusion": "blue7" # optional are removed from the config
pod1:
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Mon, 24 Jan 2022 14:00:02 -0800
Finished: Mon, 24 Jan 2022 14:00:06 -0800
pod2:
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Mon, 24 Jan 2022 14:00:02 -0800
Finished: Mon, 24 Jan 2022 14:00:07 -0800

environment:
10.175.20.128, 8 ti2080
cases: 7. docker image: centaurusinfra/tensorflow:nightly-gpu-jupyter-tfds-profiler; cmd: python mnist.py. created
7 sharepods with (the pods without 'infinity sleep')
"kubeshare/gpu_request": "0.5" # required if allocating GPU
"kubeshare/gpu_limit": "1.0" # required if allocating GPU
"kubeshare/gpu_mem": "1073741824" # required if allocating GPU # 1Gi, in bytes
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
"kubeshare/sched_exclusion": "blue#" # optional, each sharepod has its own label.
localmnist1 working pod with
"kubeshare/gpu_request": "0.4"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
"kubeshare/sched_affinity": "red" # optional
"kubeshare/sched_anti-affinity": "green" # optional
localmnist2 working pod with
"kubeshare/gpu_request": "0.6"
"kubeshare/gpu_limit": "1.0"
"kubeshare/gpu_mem": "3145728000"
results: "kubeshare/sched_exclusion": "blue7" # optional are removed from the config
localmnist1: (39s)
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Tue, 25 Jan 2022 16:44:30 -0800
Finished: Tue, 25 Jan 2022 16:45:09 -0800

localmnist2: (35s)
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Tue, 25 Jan 2022 16:44:31 -0800
Finished: Tue, 25 Jan 2022 16:45:06 -0800

localmnist: (single task, 24s)
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Tue, 25 Jan 2022 16:34:01 -0800
Finished: Tue, 25 Jan 2022 16:34:25 -0800

environment:
10.175.20.128, 8 ti2080
cases: 8. one task with gpu_request: 1.0 (25s)
Node: 128/10.175.20.128
Start Time: Wed, 26 Jan 2022 10:31:12 -0800
Labels:
Annotations: kubeshare/GPUID: grwhr
kubeshare/gpu_limit: 1.0
kubeshare/gpu_mem: 3145728000
kubeshare/gpu_request: 1.0
kubeshare/sched_affinity: red
kubeshare/sched_anti-affinity: green
centaurusinfra:
NVIDIA_VISIBLE_DEVICES: GPU-79164d27-79a9-12a4-1693-ebb12163b8dc
Started: Wed, 26 Jan 2022 10:31:14 -0800
Finished: Wed, 26 Jan 2022 10:31:38 -0800