The purpose of this repository is to familiarize oneself with running litmus chaos experiments in a realistic app environment running multiple services on different Kubernetes clusters.
It makes to spin up a fully deployed GKE cluster or EKS cluster easy with a microservice application or even you can spin up a KinD (Kubernetes-in-Docker) cluster which is a lightweight easy to use and handle for the applications and performing chaos. Sock Shop, and Litmus Chaos Engine to create chaos scenarios.
After cloning this repository, installing the requirements listed below, and using the start
command to create the fully deployed cluster, you will be able to run Litmus Chaos experiments using the test
command in the cluster. You can find all the experiment configuration under the /litmus
directory of this repository and the script to deploy and run them in
It currently works with KinD, GKE and EKS so either you can use a KinD cluster by following the below steps or you would need a Google Cloud account to run this on GKE environment or an AWS account to run this on EKS environment and the support for Azure is planned in future.
- Python 3.7 or above
- Python Dependencies:
pip install -r requirements.txt
- Google Cloud Login:
- GCloud CLI installed locally and logged in:
- AWS CLI installed locally and logged in:
- eksctl installed locally:
- Minimum IAM permissions for your AWS user:
IAM permissions for your AWS user to create the AWS ALB Ingress Controller IAM policy- Kubectl installed locally:
- Helm installed locally:
- Docker installed locally:
To see full command-line options use the -h
./ -h
This will output the following:
usage: [-h] {start,test,list,stop} ...
Spin up Litmus Demo Environment on Kubernetes.
positional arguments:**
start Start a Cluster with the demo environment deployed.
test Run Litmus ChaosEngine Experiments inside litmus demo
list List all available Litmus ChaosEngine Experiments
available to run.
stop Shutdown the Cluster with the demo environment
To start the GKE cluster and deploy all the required components:
for kind cluster
./ start --platform kind
for GKE cluster
./ start --platform GKE --project {GC_PROJECT} --key {ZE_KEY}
for EKS cluster
./ start --platform EKS --name {EKS_CLUSTER_NAME}
Flag values for start
Flag | Description | Default |
--platform or -pt |
Set the platform to start with demo enviroment. Available platforms are kind and GKE. Support for other platforms will also be added. | Default value is kind |
--name or -n |
Required when --platform is GKE. It sets GKE cluster name |
Default value is litmus-k8s-demo |
--zone or -z |
Required when --platform is GKE. It sets GCloud Zone to spin GKE cluster up in |
Default value is us-central1-a |
--project or -p |
Required when --platform is GKE. It sets GCloud Project to spin GKE cluster up in |
No Default value |
To run all the Litmus ChaosEngine experiments:
./ test
You can optionally add the --wait=
argument to change the wait time between experiments in minutes. By default,
it is 1 min.
To run a specific experiment (found under the ./litmus directory):
./ test --test=pod-delete
Flag values for test
Flag | Description | Default |
--test or -t |
Name of test to run based on yaml file name under /litmus folder. | Default value is * (all) |
--wait or -w |
Number of minutes to wait between experiments. | Default value is 1 (in min) |
--type or -ty |
Select the type of chaos to be performed, it can have values pod for pod level chaos,node for infra/node level chaos and all to perform all chaos. | Default value is all |
--platform or -pt |
Set the platform to perform chaos. Available platforms are kind and GKE. | Default value is kind |
--report or -r |
Set report flag to yes for generating pdf report of the experiment result summary | Default value is no |
- To view application deployment picked, success/failure of reconcile operations (i.e., creation of chaos-runner pod or lack thereof), check the chaos operator logs. Ex:
kubectl logs -f chaos-operator-ce-6899bbdb9-jz6jv -n litmus
- To view the parameters with which the experiment job is created, the status of experiment, the success of chaosengine patch operation, and cleanup of the experiment pod, check the logs of the chaos-runner pod. Ex:
kubectl logs sock-chaos-runner -n sock-shop
- To view the logs of the chaos experiment itself, use the value
of the chaosengine CR
kubectl logs container-kill-1oo8wv-85lsl -n sock-shop
(The detailed troubleshooting faq here:
- To re-run the chaosexperiment, cleanup and re-create the chaosengine CR
kubectl delete chaosengine sock-chaos -n sock-shop
kubectl apply -f litmus/chaosengine.yaml
We can also generate the pdf report of the experiment result summary using --report
flag as follow:
./ test --report=yes
It will generate a pdf report of name chaos-report.pdf
in the current location containing chaos result summary.
Lists all the available Litmus Chaos Experiments in this repo under the ./litmus
directory for a particular platform:
./ list --platform <platform-name>
To shut down and destroy the cluster when you're finished:
for kind cluster
./ --platform kind stop
for GKE cluster
./ --platform GKE stop --project {GC_PROJECT}
for EKS cluster
./ --platform EKS stop --name {EKS_CLUSTER_NAME} --awsregion {EKS_REGION_NAME}