/DI-orchestrator

OpenDILab RL Kubernetes Custom Resource and Operator Lib

Primary LanguageGoApache License 2.0Apache-2.0

Build Releases

DI Orchestrator

DI Orchestrator is designed to manage DI (Decision Intelligence) jobs using Kubernetes Custom Resource and Operator.

Prerequisites

  • A well-prepared kubernetes cluster. Follow the instructions to create a kubernetes cluster, or create a local kubernetes node referring to kind or minikube

Install DI Orchestrator

DI Orchestrator consists of two components: di-operator and di-server. Install them with the following command.

kubectl create -f ./config/di-manager.yaml

di-operator and di-server will be installed in di-system namespace.

$ kubectl get pod -n di-system
NAME                               READY   STATUS    RESTARTS   AGE
di-operator-57cc65d5c9-5vnvn       1/1     Running   0          59s
di-server-7b86ff8df4-jfgmp         1/1     Running   0          59s

Submit DIJob

# submit DIJob
$ kubectl create -f config/samples/atari-dqn-tasks.yaml

# get pod and you will see coordinator is created by di-operator
# a few seconds later, you will see collectors and learners created by di-server
$ kubectl get pod
NAME                         READY   STATUS    RESTARTS      AGE
job-with-tasks-collector-0   1/1     Running   0             2s
job-with-tasks-collector-1   1/1     Running   0             2s
job-with-tasks-evaluator-0   1/1     Running   0             2s
job-with-tasks-learner-0     1/1     Running   0             2s

# get logs of tasks
$ kubectl logs job-with-tasks-evaluator-0 
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:29] INFO     Evaluator running on node 1                                                                                                           func.py:58
A.L.E: Arcade Learning Environment (version +a54a328)
[Powered by Stella]
/opt/conda/lib/python3.8/site-packages/ale_py/roms/__init__.py:44: UserWarning: ale_py.roms contains unsupported ROMs: /opt/conda/lib/python3.8/site-packages/AutoROM/roms/{joust.bin, warlords.bin, maze_craze.bin, combat.bin}
  warnings.warn(
[06-28 08:25:46] INFO     Evaluation: Train Iter(0)       Env Step(0)     Eval Reward(-21.000)                                                                  func.py:58
[06-28 08:25:46] WARNING  You have not installed memcache package! DI-engine has changed to some alternatives. 

$ kubectl logs job-with-tasks-learner-0
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370172916/work/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
[06-28 08:25:27] INFO     Learner running on node 0

User Guide

Refers to user-guide. For Chinese version, please refer to 中文手册

Contributing

Refers to developer-guide.

Contact us throw opendilab.contact@gmail.com