/flux-conductr

Flux Conductr - GitOps Everything

Primary LanguageHCLApache License 2.0Apache-2.0

Flux Conductr - GitOps Everything 🧪

The primary goal of this project is to exercise and experiment with flux based GitOps deployment covering the cycle - up to production via promotion, if you want to. Experimentation and production do not have to conflict.

The change process starts at localhost. Hence, we consider localhost experience (kind and maybe k3s soon) very important. That aim is reflected by the way we expose services locally. There is another strong emphasis on fast feedback. We want things to be available quickly. That includes issues surfacing. Hence, we deeply care about observability.

Many elements should be useful in CI context. Most things however, should play nice on produtive environments as well.

This repo is mostly based on flux2-kustomize-helm-example. The docs over there should still be pretty accurate.

At the moment, we cover deployments of:

  • Terraform resources (via tf-controller)
  • Cilium
  • Metallb
  • Knative
  • Istio/Zipkin/Kiali
  • Contour
  • Kube-Prometheus
  • Loki/Promtail
  • Flagger
  • Flamingo/Flux Subsystem for Argo
  • Traefik
  • WeaveWorks GitOps
  • External Secrets
  • CSI Secrets
  • AWS Credentials Sync
  • SOPS Secrets
  • Alerting/Notifications via Slack/MS Teams
  • Image Reflector/Image Automation

Beyond that, we aim at exploring:

  • CrossPlane (Cloud Provider AWS/Azure/GCP appear to make most sense, Terraform least)

Bootrapping

Encryption keys are required for Image Automation and default gpg (sops) based secrets.

To get started, generate encryption keys for ssh/gpg:

./script/gen-keys.sh

Add public deployment key to github. You may also want to disable github actions to start.

gh repo deploy-key add ...

There is a terraform + kind based bootstrap in tf:

cp sample.tfvars terraform.tfvars
# Set proper values in terraform.tfvars
terraform apply

Alternatively, you can bootstrap or even upgrade an existing cluster (be sure to have current kube context set properly). Also, make sure flux --version shows desired version.

./scripts/flux-bootstrap.sh

Known Issues

  • knative challenging (Some bits need kustomize.toolkit.fluxcd.io/substitute: disabled in our context, other things need tweaks to upstream yaml to play with GitOps "... configured")

Speed / Registries

We want lifecycle of things (Create/Destroy) to be as fast as possible. Pulling images can slow things down significantly. Contrary docker a host based solution (such as k3s), challenges are harder with kind. Make sure to understand your the defails of your painpoints before implementing your solution.

TODO

  • Naming?
  • json error during kustomizationResourceDiff / Fix make flux-destroy
  • Deduplicate/Dry things
  • Setup "envs" properly / remove literals
  • Flux Dashboard
  • Grafana/Prometheus?
  • Demo: Flagger/Rolling/Blue/Green/Canary
  • Improve Github Actions Quality Gates
  • Borrow bits from Tanzu? (Does not appear to make sense in flux focused context)
  • Manage github with terraform/crossplane
  • babashka scripting?
  • tfctl app/terraform plan approval via ChatOps (Slack?)
  • Basic sops/lastpass/github key managment?
  • knative?
  • Replace Contour with Istio ?
  • Contour appears to play with knative, kind and flux! (use from bitnami)
  • Provide tool to wipe (shipping) encrypted secrets
  • Default to auto update everything?
  • Leverage metallb.universe.tf/allow-shared-ip: "flux-conductr" annotation to share/simplify IP address usage
  • External (M)DNS
  • Migrate zipkin to helm / Replace with tempo
  • Introduce Kyverno
  • Enable Flagger/Knative with Istio
  • Enable Alerting to Slack/Discord (needs alertmanager-discord)
  • Integrate Cilium Metrics/Monitoring
  • tf-controller : failed to verify artifact: computed checksum
  • Consider migrating make to just
  • Introduce resmoio/kubernetes-event-exporter
  • The infra / config Kustomization naming borrowed from flux2-kustomize-helm-example is not ideal. It's mostly about dependencies. Hence, the wave terminology from argcocd might be a bit better. Also, it is about concurrency.
  • Hubble UI displays Trying to reconnect streams and Datastream has failed on UI backend: EOF #21582
  • Provide easy (make based) access to docker port mappings to host services / secrets + auth

Misc/Random Bits