This repository contains a Kluctl based deployment project for Kubeflow.
This project has started as a PoC to demonstrate the capabilities of Kluctl as a deployment tool for complex Kubernetes deployment projects. Kubeflow turns out to be a much more complex deployment than usual Kubernetes projects, leading to a lot of complicated and partially manual steps required to install and maintain a Kubeflow instance.
I believe that Kluctl is able to simplify this process a lot, so I started building this PoC. My motivation is to get attention on the Kluctl project and in best case find users that feel that their needs are fulfilled, ultimately bringing adoption and maintainers to the project.
At the same time, I believe Kubeflow could potentially benefit from this effort. This project might even turn out to be a viable distribution of Kubeflow as it makes installing and long-term maintaining it so much easier.
I have read into a lot of issues in the manifests and also looked into other distributions (platform/cloud specific and more generic solutions like deployKF). What I found so far confirms my assumptions.
To make it short: Because kluctl deploy -a config=./my-config.yaml
is enough to install a fully functional Kubeflow
instance to an existing cluster. With the same command, you'll do upgrades and cleanups, re-deploy with new
configuration, and so on. There is no need for ArgoCD, FluxCD, or any other additional tooling, because the CLI is able
to perform all necessary configuration management and orchestration.
If you're already familiar with the manifests repo, you can think of Kluctl as a
replacement for the top-level kustomization.yaml
and the top-level kustomize build | kubectly apply -f-
invocation.
The difference is that it allows much better control over the deployment process. For example, the
deployment.yaml
files found on every folder level
(starting at the root), allow to control deployment order by introducing barriers
between individual deployment items. The deployment items itself are either
includes of sub-deployments, simple
Kustomize deployments or
Helm Charts.
Also, whole sub-deployments and individual items can be disabled conditionally, as seen for example in common/cert-manager/deployment.yaml
.
Ultimately, these features allow to remove most (if not all?) manual interventions required by the user while installing or upgrading Kubeflow. For example, there is no need to choose different sets of Kustomize overlays based on which type of auth setup you want to install. Instead, this can be configured via configuration files and will automatically lead to all required modifications to the deployment process.
This also allows to easily implement features like "bring your own cert-manager or istio", simply by changing configuration and using appropriate conditionals and templating in the deployment project.
Another advantage that comes for free is the integration Helm Charts with proper support of templated values. This can
for example be seen in common/dex/helm-values.yaml
.
You'll need the following things before you can start with this deployment project.
- Kluctl must be installed. Follow the installation instructions.
- You need a Kubernetes cluster. You can use a naked cluster without anything pre-installed or a cluster with istio and cert-manager pre-installed. See the next chapter for instructions to setup a local cluster.
- You need to clone this repo to a local directory.
You can skip this if you bring your own cluster. Otherwise, install Kind anc create a local cluster:
cat <<EOF | kind create cluster --name=kubeflow --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
apiServer:
extraArgs:
"service-account-issuer": "kubernetes.default.svc"
"service-account-signing-key-file": "/etc/kubernetes/pki/sa.key"
- role: worker
- role: worker
- role: worker
EOF
Please note that the above Kind config creates 3 worker nodes. We need these as you'll otherwise run out of CPU resources. A future version of this deployment project will support running with lower resource quotas to allow easier local testing.
You'll now have the Kubernetes context kind-kubeflow
setup and configured as the current context. We'll later invoke
Kluctl with --context kind-kubeflow
, which you can actually skip if your current context is setup properly. We'll still
do this to avoid that you end up messing another cluster up. Future versions of this deployment project will properly
support/describe using targets with contexts
being bound.
Now it's time to create your own configuration by copying sample-config.yaml
to my-config.yaml
and perform
the desired modifications. Check the contents of the sample-config.yaml
and config/*-defaults.yaml
for all available
configuration options.
If you look into deployment.yaml
you'll see multiple
vars sources being loaded. These are merged together
and later used in the deployment sources itself to perform some templating.
There is also an optional (marked via when: args.config
) that allows to load additional configuration from an externally
provided configuration file. This is done via the -a config=my-config.yaml
later.
To actually deploy Kubeflow, run:
kluctl deploy --context=kind-kubeflow -a config=my-config.yaml
This will perform a dry-run first and show a diff before actually deploying. The diff must be approved by pressing y
.
After that, it shows what actually happened.
This command is basically all you'll need to do to re-deploy with new configuration or later update to newer versions. The dry-run based diff will give you some confidence in what you're doing as you'll always know what's going to happen before it actually happens.
Try it out and modify the configuration. A good test is to change the authentication mode from auth-service to
oauth2-proxy by switching the enabled
flags appropriately. After this, re-deploy:
kluctl deploy --context=kind-kubeflow -a config=my-config.yaml --prune
Please note the --prune
that we added. It will instruct Kluctl to not just detect orphan resources, but actually prune
them. If omitted, you can also use kluctl prune --context=kind-kubeflow -a config=my-config.yaml
to prune/cleanup.
- More configuration
- Integration of manifests repo
- keeping components up-to-date
- SOPS
- GitOps