/kubernetes-setup

MLO group setup for kubernetes cluster

Primary LanguageDockerfile

Instruction for using the container cluster (Kubernetes, k8s)

You can refer to this repository for more information, below are steps for most common setups.


Table of Contents

Requesting access

Use this form to request access (use Accréditation=MLO).

Once you have been approved, you will get an email from the IC with a zip file named .zip. Unzip it and put these files into the .kube folder of your home directory. Then rename <your-name>.config to config.

cd ~
mkdir .kube
# Save your-name.zip in the .kube folder
mv .kube/*.config .kube/config

Installing the Kubernetes client on your personal machine

Ubuntu/Debian

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Check that it is working by running:

kubectl get pods

MacOS

brew install kubernetes-cli

Others

Follow instructions on the kubernetes docs.

Setting up Kubernetes

To use a kubernetes pod, you need to:

Creating a Dockerfile

You can build your own Dockerfile based on this basic example.
You can specify the software and Python packages you need in there.

To make sure you can access data on the network storage mlodata1 and mloraw, you should make sure that the main user in your docker image has the same user ID as you have on EPFL's system.
To achieve this, put your Gaspar ID after NB_USER= and your UID after NB_UID=.
You can get you uid by using the id command on a an iccluster node.

The FROM line allows you to choose an image to start from. You can choose from images on the Dockerhub (or elsewhere).

Building a Docker image

Once you are happy with the Dockerfile, go to the directory of the Dockerfile and run:

docker build . -t <your-tag>

Replace <your-tag> by the name you want to give to this Docker image.
It is good practice to put your username first, for example jaggi-base.

Pushing the Docker image

When you start a pod on the Kubernetes cluster, you tell it to run your docker image.
The cluster will search for your image on EPFLs internal docker repository Harbor.
You should upload your create image to this repository.

Go have a look at https://ic-registry.epfl.ch and use your gaspar credentials to login in.
There already is a group project named mlo. Please ask someone in the lab already using kubernetes to add you to the mlo group so that you can push your Docker image to that repository.

Now take the following steps:

Login to Harbor on your personal machine

Login to the server by running the following command

docker login ic-registry.epfl.ch

and enter the credentials:

Username: robot$mlo-image-publisher Password:

eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1ODg5MjIwNjEsImlhdCI6MTU4NjMzMDA2MSwiaXNzIjoiaGFyYm9yLXRva2VuLWlzc3VlciIsImlkIjozLCJwaWQiOjYsImFjY2VzcyI6W3siUmVzb3VyY2UiOiIvcHJvamVjdC82L3JlcG9zaXRvcnkiLCJBY3Rpb24iOiJwdXNoIiwiRWZmZWN0IjoiIn1dfQ.RI9AhLg94Y1piWKZ4cR-UWOFw39fX3NnuoPM93Ux6T2BR6azMYNUDGSSD8-p17dX82UkaJ4jAYfa19b6e1VudT-YM21QWjbWjWnnnpLfe0wEpL8Y9ddP8kbgfWzhZKxt_RNe8Wl7QjTxQYjp-bdKw1CY1v8NGoltf3nILLYbr8g4RiODQsB-XBCVfCxFUZcQkhy39hq9ckPbEs3jh2HuN7s3IRQGfRAbkQ5DKo1wp967Zkf1LYopF6-W8hZyWq69XzsqixX6UaF8izaZVCGkqPqw1DlgKp6Lropwnb8GT9TjV_kUfX6A6Ju3yEgBtcFEOWCKDYeRlFgPuu1DW3Sy7dnHeDOUNvSeB8ANgY-QesSasQ8LSrFjIcsG9fZ8I_NBCx0CEQCenoRMQHpUqDFZoyFbMaq8zrEO9yshEhHggLoTTd6GHDByxqmWN15dfCZrHHGKmSwW34t5q_a6fsELuAtCmy8j-FvdbB3zQiJF8dj58DKDbIya4R8GdoFq0hOUopZsetUpHAhMwnJ3TRJrJVo7IXzzjT6i5q85qoOHEwPpr0UJHK05zGSXsjoKzTMG26togEnd6GlApuzWpEF21f0eYHib-pkJY1oCcQpobiFKrSwvcYVyjUMaxFMLm1le16Lpk83CEXgstSgSPx_lB1qwMK7zauqpFhrHI-Fc7mQ

This is a 'one-in-a-lifetime' steps.

Actually pushing the Docker image

To push an image to a private registry (and not the central Docker registry) you must tag it with the registry hostname.
Then you can push it:

docker tag <your-tag> ic-registry.epfl.ch/mlo/<your-tag>
docker push ic-registry.epfl.ch/mlo/<your-tag>

Creating a Kubernetes pod config file

Have a look at (and download) this simple kubernetes config file. Fill all elements that are in <brackets> .
<your-pod-name> needs not be the same as <your-docker-image-tag> but again it is good practice to put your name first for the pod name, for example jaggi-pod.

In this config file,

  • you can change: nvidia.com/gpu: 1 to request more or fewer gpus
  • you can see at the end that mlodata1 is mounted. You can remove it or change it for mloscratch
  • you specify which command is run when launching the pod. Here it will sleep for 60 seconds and then stop

Commands

  • To have a container run forever, you can use:

    command: [sleep, infinity]

    and then you can connect to the pod through ssh and run your jobs from there.

    If you do this, make sure to delete the pod once you are done to free the resource !

  • To run more complex or multiple commands, you can do:

     command: ["/bin/bash", "-c"]
     args: ["command1; command2 && command3"]

    For example:

     command: ["/bin/bash", "-c"]
     args: ["cd /mlodata1/jaggi/ml && python automl.py"]

    The resource will be automatically freed once the command has run. The pod gets status Completed but is not deleted.

Using Kubernetes

Creating a pod

Go to the directory where your kubernetes config file is and run:

kubectl create -f <your-configfile-name>.yaml

Checking pods status

kubectl get pods  # get all pods
kubectl get pods -l user=jaggi  # filter by label (defined in the config file)
kubectl get pod jaggi-pod  # get by pod name

SSH to a pod

kubectl exec -it jaggi-pod /bin/bash

Deleting a pod

kubectl delete pod jaggi-pod

Getting information on a pod

Useful for debugging

kubectl describe pod jaggi-pod
kubectl get pod jaggi-pod -o yaml
kubectl logs jaggi-pod

Note on Storage across icclusters

(mounting /mlo-container-scratch)

Follow the instructions in Kubernetes basics, and use

volumeMounts:
- mountPath: /scratch
   name: mlo-scratch
   subPath: YOUR_USERNAME

and

volumes:
- name: mlo-scratch
   persistentVolumeClaim:
   claimName: mlo-scratch

(mounting /mlodata1)

spec:
  volumes:
  - name: mlodata1
    persistentVolumeClaim:
      claimName: pv-mlodata1
  containers:
  - name:  ubuntu
    volumeMounts:
    - mountPath: /mlodata1
      name: mlodata1

Other Resources and deployment templates

Here you can find some kubernetes templates:

And some personalized Dockerfile:

and some more documentation from Tao's github.

Some Tips (deprecated)

  • By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. This is solved by changing the default user in Docker (which is already done in the simple Dockerfile) (Here another example from Tao)
  • To avoid the error sudo: no tty present and no askpass program specified, please use sudo -S xxx.