Instruction of using the container cluster (Kubernetes, k8s)
Use this form to request access (use Accréditation=MLO).
Please refer to this repository for your basic setup.
There are two approaches to running pods on the container cluster:
- Like in the
Kubernetes basics
, with command:[sleep, infinity]
, and then connecting to the pod over ssh to run an experiment- This can be convenient for playing around. You can temporarily spin up as many nodes as you want
- But you pay GPU time you don't use.
- Use something like
command: [run, my, experiment]
.- This makes debugging slightly harder, but as soon as your job finishes, the pod gets status
Completed
, and you (Martin) will stop paying for the pod.
- This makes debugging slightly harder, but as soon as your job finishes, the pod gets status
Follow the instructions in Kubernetes basics
, and use
volumeMounts:
- mountPath: /scratch
name: mlo-scratch
subPath: YOUR_USERNAME
and
volumes:
- name: mlo-scratch
persistentVolumeClaim:
claimName: mlo-scratch
spec:
volumes:
- name: mlodata1
persistentVolumeClaim:
claimName: pv-mlodata1
containers:
- name: ubuntu
volumeMounts:
- mountPath: /mlodata1
name: mlodata1
Go to https://ic-registry.epfl.ch
and use your gaspar to login in.
There already has a group project named mlo
. Please ask the owner of the group project to give you the corresponding permission so that you can push your docker image to that repository.
Once you get the image and have the permission, you can push to the remote host, e.g.,
docker push ic-registry.epfl.ch/mlo/ml:1.0
You can find some provided templates, e.g.,
- By default, a Docker container will run as root. This means that the files you write in the shared storage are owned by root. You can solve this by changing the default user in Docker (example from Tao)
- To avoid the error
sudo: no tty present and no askpass program specified
, please usesudo -S xxx
.