CML and Kubernetes
rickvanveen opened this issue · 5 comments
Hi! I'm following the documentation here about how to setup a self-hosted runner. Instead AWS we use a self hosted kubernetes setup.
I got the launch-runner
job working (although I needed to add setup steps for node, kubectl (+ plugin), and terraform to get it to work). And my understanding of this at the moment is: This job creates a, in kubernetes terminology, pod that runs a container which registers itself as self-hosted
github/cml runner.
The second job train-and-report
in the documentation uses a container image again. However, when I try to use my simple training job using this same option ( container: image: docker://iterativeai/cml:0-dvc2-base1-gpu options: --gpus all
, but then with correct whitespace). I get an error. Namely, Error: docker: command not found
.
My confusion is now: How to set up CML/Github/kubernetes in the intended and appropriate way. I can imagine I need to customize the container that registers itself as the github/cml runner to include docker? Am I missing something from the documentation.
Any explanation and help is appreciated!
Hello @rickvanveen!
As per iterative/terraform-provider-iterative#146 (comment), using container
in your workflow jobs isn't supported when the runner itself is containerized (e.g. on Kubernetes)
Try installing CML with an uses: iterative/setup-cml@v1
step instead of using the iterativeai/cml:0-dvc2-base1-gpu
container image.
Okay I get it I think. Does this also mean that when I go from my simple models to more complex ones I will need to setup all the requirement to train those complex models through Github actions and I cannot use the provided containers?
Currently there is no way of specifying a custom container image, sorry. Still — partially due to this related use case — it may be implemented relatively soon.
Great! I will look into it as soon as possible. I think it will help.