DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator.
- A well-prepared kubernetes cluster. Follow the instructions to create a kubernetes cluster, or create a local kubernetes node referring to kind or minikube
DI Orchestrator consists of two components: di-operator
and di-server
. Install them with the following command.
kubectl create -f ./config/di-manager.yaml
di-operator
and di-server
will be installed in di-system
namespace.
$ kubectl get pod -n di-system
NAME READY STATUS RESTARTS AGE
di-operator-57cc65d5c9-5vnvn 1/1 Running 0 59s
di-server-7b86ff8df4-jfgmp 1/1 Running 0 59s
# submit DIJob
$ kubectl create -f config/samples/atari-dqn-tasks.yaml
# get pod and you will see coordinator is created by di-operator
# a few seconds later, you will see collectors and learners created by di-server
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
job-with-tasks-collector-0 1/1 Running 0 2s
job-with-tasks-collector-1 1/1 Running 0 2s
job-with-tasks-evaluator-0 1/1 Running 0 2s
job-with-tasks-learner-0 1/1 Running 0 2s
# get logs of tasks
$ kubectl logs job-with-tasks-evaluator-0
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:29] INFO Evaluator running on node 1 func.py:58
A.L.E: Arcade Learning Environment (version +a54a328)
[Powered by Stella]
/opt/conda/lib/python3.8/site-packages/ale_py/roms/__init__.py:44: UserWarning: ale_py.roms contains unsupported ROMs: /opt/conda/lib/python3.8/site-packages/AutoROM/roms/{joust.bin, warlords.bin, maze_craze.bin, combat.bin}
warnings.warn(
[06-28 08:25:46] INFO Evaluation: Train Iter(0) Env Step(0) Eval Reward(-21.000) func.py:58
[06-28 08:25:46] WARNING You have not installed memcache package! DI-engine has changed to some alternatives.
$ kubectl logs job-with-tasks-learner-0
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:27] INFO Learner running on node 0
Refers to user-guide. For Chinese version, please refer to 中文手册
Refers to developer-guide.
Contact us throw opendilab.contact@gmail.com