CML and Kubernetes

Question

CML and Kubernetes

rickvanveen opened this issue 2 years ago · 5 comments

Hi! I'm following the documentation here about how to setup a self-hosted runner. Instead AWS we use a self hosted kubernetes setup.

I got the launch-runner job working (although I needed to add setup steps for node, kubectl (+ plugin), and terraform to get it to work). And my understanding of this at the moment is: This job creates a, in kubernetes terminology, pod that runs a container which registers itself as self-hosted github/cml runner.

The second job train-and-report in the documentation uses a container image again. However, when I try to use my simple training job using this same option ( container: image: docker://iterativeai/cml:0-dvc2-base1-gpu options: --gpus all, but then with correct whitespace). I get an error. Namely, Error: docker: command not found.

My confusion is now: How to set up CML/Github/kubernetes in the intended and appropriate way. I can imagine I need to customize the container that registers itself as the github/cml runner to include docker? Am I missing something from the documentation.

Any explanation and help is appreciated!

Answer 1 · 2022-12-14T08:36:35.000Z

Hello @rickvanveen!

As per iterative/terraform-provider-iterative#146 (comment), using container in your workflow jobs isn't supported when the runner itself is containerized (e.g. on Kubernetes)

Try installing CML with an uses: iterative/setup-cml@v1 step instead of using the iterativeai/cml:0-dvc2-base1-gpu container image.

Answer 2 · 2022-12-14T14:05:58.000Z

Okay I get it I think. Does this also mean that when I go from my simple models to more complex ones I will need to setup all the requirement to train those complex models through Github actions and I cannot use the provided containers?

Answer 3 · 2022-12-27T04:58:12.000Z

Currently there is no way of specifying a custom container image, sorry. Still — partially due to this related use case — it may be implemented relatively soon.

Answer 4 · 2023-01-04T09:21:06.000Z

#1302 may help you achieve what you want, although any images you specify will have to use one of our images¹ as base.

E.g. FROM iterativeai/cml:0-dvc2-base1-gpu in your Dockerfile ↩

Answer 5 · 2023-01-04T09:29:52.000Z

Great! I will look into it as soon as possible. I think it will help.

Footnotes