/flux2-hub-spoke

A hub-and-spoke example for multi-cluster continuous delivery with Flux

Primary LanguageShellApache License 2.0Apache-2.0

flux2-hub-spoke-example

test e2e license

This repository showcases how to run Flux on a central Kubernetes cluster and have it manage the GitOps continuous delivery of apps and infrastructure workloads on multiple clusters.

Flux Hub and Spoke

Prerequisites

For this example, you need to install the following tools:

On macOS and Linux, you can install the tools using the following commands:

brew install fluxcd/tap/flux
brew install kind kubectl kustomize

In order to follow the guide you'll need a GitHub account and a personal access token that can create repositories (check all permissions under repo).

Repository structure

The Git repository contains the following top directories:

  • deploy dir contains the HelmRelease definitions for the apps and infrastructure workloads
  • clusters dir contains the apps and infrastructure Kustomize overlays for each target cluster
  • hub dir contains the Flux configuration for the central cluster and targets
├── deploy
│   ├── apps
│   │   ├── podinfo.yaml
│   │   └── kustomization.yaml
│   ├── infra-configs
│   │   ├── cluster-issuers.yaml
│   │   └── kustomization.yaml
│   ├── infra-controllers
│   │   ├── cert-manager.yaml
│   │   ├── ingress-nginx.yaml
│   │   └── kustomization.yaml
│   └── tenants
├── clusters
│   ├── production
│   └── staging
│       ├── apps
│       │   ├── kustomization.yaml
│       │   └── podinfo-values.yaml
│       ├── infra-configs
│       ├── infra-controllers
│       └── tenants
└── hub
    ├── flux-system
    ├── production.yaml
    └── staging.yaml

Bootstrap the cluster fleet

To bootstrap the cluster fleet, first you need to create several Kubernetes KinD clusters by running the following command:

make fleet-up

The above command will create the following clusters:

  • flux-hub - the central cluster where Flux will run
  • flux-staging - the target cluster where Flux will deploy the clusters/staging workloads
  • flux-production - the target cluster where Flux will deploy the clusters/production workloads

After the clusters are created, kubeconfig files for staging and production are generated and persisted in the flux-hub cluster, so that Flux can access the target clusters.

$ kubectl get secrets -A
NAMESPACE     NAME                     TYPE
production    cluster-kubeconfig       Opaque
staging       cluster-kubeconfig       Opaque

Fork this repository on your personal GitHub account and export your GitHub access token, username and repo name:

export GITHUB_TOKEN=<your-token>
export GITHUB_USER=<your-username>
export GITHUB_REPO=<repository-name>

Then, bootstrap Flux on the hub cluster:

flux bootstrap github \
    --context=kind-flux-hub \
    --owner=${GITHUB_USER} \
    --repository=${GITHUB_REPO} \
    --branch=main \
    --personal \
    --path=hub

The bootstrap command commits the manifests for the Flux components in hub/flux-system dir and creates a deploy key with read-only access on GitHub, so it can pull changes inside the cluster.

Wait for the Flux to reconcile the infrastructure and apps workloads on the target clusters with:

watch flux get kustomizations -A

Once the Flux Kustomizations are ready, you can list the Helm releases deployed in the target clusters. For example, in the staging cluster:

$ helm --kube-context kind-flux-staging ls -A
NAME            NAMESPACE       STATUS     CHART
cert-manager    cert-manager    deployed   cert-manager-v1.14.4
ingress-nginx   ingress-nginx   deployed   ingress-nginx-4.10.0 
podinfo         podinfo         deployed   podinfo-6.6.2  

For each target cluster, there is a corresponding namespace in the hub cluster that contains the Flux HelmRelease objects for the apps and infrastructure workloads. For example, in the staging namespace:

$ flux --context kind-flux-hub -n staging get hr
NAME            REVISION        SUSPENDED       READY                                                                                           
cert-manager    v1.14.4         False           True        
ingress-nginx   4.10.0          False           True      
podinfo         6.6.2           False           True

Tip

Note that Flux detects changes made directly in-cluster on the objects managed by a HelmRelease and automatically corrects the drift. During an incident or for debugging purposes, you can manually suspend the reconciliation of a HelmRelease with flux suspend hr <name> -n <namespace>.

Customize the workloads

Assuming you want to ship workloads to the production cluster with a different configuration, you can employ Kustomize patches in the clusters/production overlay and change the Flux HelmRelease values.

For example, to change the number of replicas for ingress-nginx in the production cluster, you can create a patch file in clusters/production/infra-controllers/ingress-nginx-values.yaml:

apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: ingress-nginx
spec:
  chart:
    spec:
      version: ">=4.10.0"
  values:
    controller:
      replicaCount: 2

And then apply the patch to the ingress-nginx HelmRelease in the clusters/production/infra-controllers/kustomization.yaml file with:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../../deploy/infra-controllers
patches:
  - target:
      kind: HelmRelease
      name: ingress-nginx
    path: ingress-nginx-values.yaml

Verify that the patch is correctly applied with:

kustomize build ./clusters/production/infra-controllers/

After you commit the changes to the repository, Flux will automatically apply the changes.

You can trigger a manual reconciliation with:

flux -n production reconcile ks infra-controllers --with-source

To verify the number of pods, you can list the deployments in the production cluster:

kubectl --context kind-flux-production -n ingress-nginx get deploy
NAME                       READY   UP-TO-DATE
ingress-nginx-controller   2/2     2

Important

Note that on production clusters, it is recommended to pin the Helm chart to an exact version and to use a promotion workflow to test new versions on the staging cluster before deploying to production. For more information, see the guide Promote Flux Helm Releases with GitHub Actions.

Security considerations and dependency management

In the deploy/tenants dir we provision the tenant namespaces and RBAC resources. There are two types of tenants:

  • cluster admins - have full access to the cluster resources and can deploy HelmReleases that contain CRD controllers
  • app operators - have restricted access to the app namespaces and can't manage cluster-wide resources like CRDs

At bootstrap, Flux provisions the tenant namespaces and RBAC resources in the target clusters. The deploy/apps HelmReleases are deployed using the flux-restricted service account while the deploy/infra-controllers HelmReleases and the deploy/infra-configs custom resources are deployed using the flux-cluster-admin service account.

To enforce the RBAC restrictions, and to provision the controllers before the custom resources, we use the dependsOn feature in the hub/staging.yaml and hub/production.yaml to order the reconciliation like so:

  1. tenants (namespaces, service accounts and role bindings resources)
  2. infra-controllers (CRD controllers - depends on tenants)
  3. infra-configs (cluster-wide custom resources - depends on infra-controllers)
  4. apps (app workloads - depends on infra-configs)

Tip

When managing a large number of tenants and clusters, it is recommended to use run a dedicated Flux instance for each group of clusters belonging to the same tenant. For more information on how to assign Flux instances to specific clusters, see the Flux sharding and horizontal scaling guide.

Cluster connectivity and access control

For the Flux kustomize-controller and helm-controller to be able to reconcile the remote clusters, the Kubernetes API servers need to be accessible from the central cluster.

The Flux controllers authenticate with the target clusters using kubeconfig files stored as Kubernetes secrets in the central cluster.

Both the Flux Kustomization and HelmRelease objects take a reference to the Kubernetes secret containing the kubeconfig file:

kind: Kustomization | HelmRelease
spec:
  kubeConfig:
    secretRef:
      name: cluster-kubeconfig

The secret defined in the secretRef must exist in the same namespace as the Kustomization or HelmRelease object, and the kubeconfig file must be stored in the value data key.

If the target clusters are accessible over a proxy, the proxy address must be set in the kubeconfig file. If the target API servers use self-signed certificates, both controllers can be configured to skip the TLS verification by setting the --insecure-kubeconfig-tls flag in the controller container args.

Important

Note that kubeconfigs that rely on exec-based authentication plugins are not supported by default. You will need to build custom container images with the necessary binaries and configure the controllers with the --insecure-kubeconfig-exec flag. Another option is to generate kubeconfigs with bearer tokens and refresh them periodically with a CronJob that runs e.g. aws eks get-token.

Monitoring and alerting

To configure Prometheus, Loki and Grafana for monitoring the Flux controllers and the workloads reconciliation, see the monitoring example repository.

To configure Flux to send events for Slack, Teams, Discord, Sentry and others external system, you can follow the alerting guide.

Testing

After making changes to the manifests, you can validate them locally with kubeconform by running:

make validate

Any change to the Kubernetes manifests or to the repository structure should be validated in CI before a pull requests is merged into the main branch and synced on the cluster.

This repository contains the following GitHub CI workflows:

  • the test workflow validates the Kubernetes manifests and Kustomize overlays are conformant with the Flux OpenAPI spec
  • the e2e workflow starts the Kubernetes cluster fleet in CI and tests the setup by running Flux in Kubernetes Kind

Teardown

To delete the cluster fleet, run the following command:

make fleet-down