Instruction for using the container cluster (Kubernetes, k8s)
You can refer to this repository for more information, below are steps for most common setups.
- Requesting access
- Installing Kubernetes
- Setting up Kubernetes
- Using Kubernetes
- Note on Storage across icclusters
- Other resources and deployment templates
Use this form to request access (use Accréditation=MLO).
Once you have been approved, you will get an email from the IC with a zip file named .zip.
Unzip it and put these files into the .kube
folder of your home directory. Then rename <your-name>.config
to config
.
cd ~
mkdir .kube
# Save your-name.zip in the .kube folder
mv .kube/*.config .kube/config
sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl
Check that it is working by running:
kubectl get pods
brew install kubernetes-cli
Follow instructions on the kubernetes docs.
To use a kubernetes pod, you need to:
- Create a Dockerfile to describe the experimental environment
- Build a Docker image from it
- Push the docker image to ic-registry.epfl.ch/mlo/
- Create a kubernetes config file
You can build your own Dockerfile based on this basic example.
You can specify the software and Python packages you need in there.
To make sure you can access data on the network storage mlodata1
and mloraw
, you should make sure that the main user in your docker image has the same user ID as you have on EPFL's system.
To achieve this, put your Gaspar ID after NB_USER=
and your UID after NB_UID=
.
You can get you uid by using the id
command on a an iccluster node.
The FROM
line allows you to choose an image to start from. You can choose from images on the Dockerhub (or elsewhere).
Once you are happy with the Dockerfile, go to the directory of the Dockerfile and run:
docker build . -t <your-tag>
Replace <your-tag>
by the name you want to give to this Docker image.
It is good practice to put your username first, for example jaggi-base
.
When you start a pod on the Kubernetes cluster, you tell it to run your docker image.
The cluster will search for your image on EPFLs internal docker repository Harbor.
You should upload your create image to this repository.
Go have a look at https://ic-registry.epfl.ch and use your gaspar credentials to login in.
There already is a group project named mlo
. Please ask someone in the lab already using kubernetes to add you to the mlo group so that you can push your Docker image to that repository.
Now take the following steps:
Login to the server by running the following command
docker login ic-registry.epfl.ch
and enter the credentials:
Username: robot$mlo-image-publisher
Password:
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1ODg5MjIwNjEsImlhdCI6MTU4NjMzMDA2MSwiaXNzIjoiaGFyYm9yLXRva2VuLWlzc3VlciIsImlkIjozLCJwaWQiOjYsImFjY2VzcyI6W3siUmVzb3VyY2UiOiIvcHJvamVjdC82L3JlcG9zaXRvcnkiLCJBY3Rpb24iOiJwdXNoIiwiRWZmZWN0IjoiIn1dfQ.RI9AhLg94Y1piWKZ4cR-UWOFw39fX3NnuoPM93Ux6T2BR6azMYNUDGSSD8-p17dX82UkaJ4jAYfa19b6e1VudT-YM21QWjbWjWnnnpLfe0wEpL8Y9ddP8kbgfWzhZKxt_RNe8Wl7QjTxQYjp-bdKw1CY1v8NGoltf3nILLYbr8g4RiODQsB-XBCVfCxFUZcQkhy39hq9ckPbEs3jh2HuN7s3IRQGfRAbkQ5DKo1wp967Zkf1LYopF6-W8hZyWq69XzsqixX6UaF8izaZVCGkqPqw1DlgKp6Lropwnb8GT9TjV_kUfX6A6Ju3yEgBtcFEOWCKDYeRlFgPuu1DW3Sy7dnHeDOUNvSeB8ANgY-QesSasQ8LSrFjIcsG9fZ8I_NBCx0CEQCenoRMQHpUqDFZoyFbMaq8zrEO9yshEhHggLoTTd6GHDByxqmWN15dfCZrHHGKmSwW34t5q_a6fsELuAtCmy8j-FvdbB3zQiJF8dj58DKDbIya4R8GdoFq0hOUopZsetUpHAhMwnJ3TRJrJVo7IXzzjT6i5q85qoOHEwPpr0UJHK05zGSXsjoKzTMG26togEnd6GlApuzWpEF21f0eYHib-pkJY1oCcQpobiFKrSwvcYVyjUMaxFMLm1le16Lpk83CEXgstSgSPx_lB1qwMK7zauqpFhrHI-Fc7mQ
This is a 'one-in-a-lifetime' steps.
To push an image to a private registry (and not the central Docker registry) you must tag it with the registry hostname.
Then you can push it:
docker tag <your-tag> ic-registry.epfl.ch/mlo/<your-tag>
docker push ic-registry.epfl.ch/mlo/<your-tag>
Have a look at (and download) this simple kubernetes config file.
Fill all elements that are in <brackets> .
<your-pod-name>
needs not be the same as <your-docker-image-tag>
but again it is good practice to put your name first for the pod name, for example jaggi-pod
.
In this config file,
- you can change:
nvidia.com/gpu: 1
to request more or fewer gpus - you can see at the end that mlodata1 is mounted. You can remove it or change it for mloscratch
- you specify which command is run when launching the pod. Here it will sleep for 60 seconds and then stop
-
To have a container run forever, you can use:
command: [sleep, infinity]
and then you can connect to the pod through ssh and run your jobs from there.
If you do this, make sure to delete the pod once you are done to free the resource !
-
To run more complex or multiple commands, you can do:
command: ["/bin/bash", "-c"] args: ["command1; command2 && command3"]
For example:
command: ["/bin/bash", "-c"] args: ["cd /mlodata1/jaggi/ml && python automl.py"]
The resource will be automatically freed once the command has run. The pod gets status
Completed
but is not deleted.
Go to the directory where your kubernetes config file is and run:
kubectl create -f <your-configfile-name>.yaml
kubectl get pods # get all pods
kubectl get pods -l user=jaggi # filter by label (defined in the config file)
kubectl get pod jaggi-pod # get by pod name
kubectl exec -it jaggi-pod /bin/bash
kubectl delete pod jaggi-pod
Useful for debugging
kubectl describe pod jaggi-pod
kubectl get pod jaggi-pod -o yaml
kubectl logs jaggi-pod
Follow the instructions in Kubernetes basics
, and use
volumeMounts:
- mountPath: /scratch
name: mlo-scratch
subPath: YOUR_USERNAME
and
volumes:
- name: mlo-scratch
persistentVolumeClaim:
claimName: mlo-scratch
spec:
volumes:
- name: mlodata1
persistentVolumeClaim:
claimName: pv-mlodata1
containers:
- name: ubuntu
volumeMounts:
- mountPath: /mlodata1
name: mlodata1
Here you can find some kubernetes templates:
And some personalized Dockerfile:
and some more documentation from Tao's github.
- By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. This is solved by changing the default user in Docker (which is already done in the simple Dockerfile) (Here another example from Tao)
- To avoid the error
sudo: no tty present and no askpass program specified
, please usesudo -S xxx
.