The primary goal of this project is to build a Chaos Engineering environment around the LitmusChaos platform. We try hard to provide a smooth development process including GitOps based deployment. Hence, we are leveraging flux
, terraform
, nix
(using devenv as a nix flake) and kind
(maybe k3s
soon). nix
is no requirement, but strongly recommended as it should automatically provide you with the other tools - you should not have to worry about how to install things with your package manager.
If you just want to kick the Chaos the tires quickly, or if you want to build a long lasting Chaos environment : This might be a place to start.
Experimentation is a natural element of Chaos Engineering. However, it should be just as natural in Software Development in general. That is why you might encounter bits (such as Knative) with no strong Chaos Engineering relationship in this repo. Those are meant to be optional.
The default localhost
cluster environment has very few requirements. It should work on many types of clusters. However, it is optimized to work with just enough resources to run the whole Chaos Stack and the resilient Sock Shop. It aims to make things easiliy accessible.
Various things I built upon had minor issues (mostly because there where outdated). At this time, the "fixes" are here because I wanted to move on quickly. Would be happy to contribute back.
This repo is derived from flux-conductr. Look at that, if you are after a similar experience, focused on flux
specifically.
- LitmusChaos platform
- This repo acts as a ChaosHub
- We serve the Sock Shop Microservices Demo Application as a scenario (defaulting to
containerd
experiments) - Tightly integrated Prometheus Stack including Grafana provisioned for the Sock Shop Appliation
- Loki
- Istio Eventing/Serving/Tracing (zipkin)
- Cilium
- Knative
- Locust load testing (supporting the UI)
- Portal API usage examples
- Support for deployment in proxy/custom CA environments
- Flux-/Terraform Deployment
- Nix Dev Experience
- Doom (Opt-In/Next Gen)
Even though, we am trying to cover most things declaratively, some random bits may be covered by make
targets. Simply calling the default target:
make
should output help hinting at what is covered.
You may also want to disable github actions to start.
Optional: Generate ssh deployent keys and add public key to your repo
make gen-keys
make gh-add-deploy-key
There is a terraform
+ kind
based bootstrap in tf
.
cp sample.tfvars terraform.tfvars
# Set proper values in terraform.tfvars
make apply
This should spin up the limus server. Once it is up
make open-app
should open it in your browser.
Alternatively, you can bootstrap or even upgrade an existing cluster (be sure to have current kubecontext set properly). Also, make sure flux --version
shows desired version.
./scripts/flux-bootstrap.sh
We aim at supporting environments requiring a proxy (including custom CA certificate chains) to access external services.
A proxy has to be introduced in various places. Many systems (including kind
) support configuration via environment variables, namely HTTPS_PROXY
, HTTP_PROXY
and NO_PROXY
.
For flux
, we ship a local-proxy
cluster adding that environment. Set this cluster in tf/terraform.tfvars
to try it.
For litmus
, we only ship a runtime patch at the moment.
Regarding custom certificates, we simply overlay the compiled file in the containers using a ConfigMap
. By default, we assume we can generate it on the host executing the initial deployment:
make -n recreate-ca-res
make -n patch-litmus-ca-certs patch-litmus-proxy-env
should give you an idea how we patch a system.
The terraform
module provides a mechanism to patch the coredns
ConfigMap
. This may come in handy when working with a proxy.
I use mitmproxy
locally to try things out.
The local
cluster uses metallb to provide a loadbalancer. It binds multiple services to a single IP using metallb.universe.tf/allow-shared-ip
.
The following ports are used:
9091
: Litmus Portal9002
: Litmus Server (for remote agents)3000
: Grafana9411
: Zipkin (Mesh/Tracing)20001
: Kiali (Mesh/Istio)
Acting as a ChaosHub, this repo serves the sock-shop
scenario/workflow
Grafana : admin
/ prom-operator
.
Litmus : admin
/ litmus
.
- There are TODO tags in the code
- Leverage
kustomize
with remote repos/resources in workflow (litmuschaos/k8s:latest
does not yet havegit
) - Leverage Istio for failure injection?
- This repo can act as a ChaosHub - add it during setup
- Add first class support for
mitmproxy
(ship deployment) Add first class support for remote agent?Try GitOps scenarios?- Manifests Naming
- Fix annoying terraform plan
yaml_incluster
- Add knative-serving/eventing/dns (using
nip.io
?) - Add mongodb/prometheus convenience (e.g. auth) targets to
Makefile
- Test drive 3.0-beta
disk-fill
does not yet play with containerd?- Catchup
cron
scheduled sock-shop workflow - Introduce PrometheusRule Sock-Shop alerts
- Recover chaos "enabled" in Sock Shop Dashboard
Introduce istio based tracing- Introduce deas/calendar_monkey? ;)
- Use NodePort instead of LoadBalancer locally (just like we do it in
flux-conductr
)
- Some experiments from
litmus-go
appear to rely on/var/run/docker.sock
which does not exist with containerd based environments (see) - Knative deployment straight from github deployment not possible
- knative challenging, should probably merge
kustomize.toolkit.fluxcd.io/substitute: disabled
viakustomize
. Other things need tweaks to upstream yaml to play with GitOps "... configured" / Managed fields) - Istio Ingress appears to have an image pulling issue, so it takes a while to come up
- litmus
helm
release removal should remove default agent?
- https://docs.cilium.io/en/stable/network/istio/
- https://knative.dev/docs/install/installing-istio/#installing-istio
- Deploy knative straight from github? like flux-monitoring.yaml?
- Running Knative with Istio in a Kind Cluster (old!)
- Install Knative using quickstart