/kube-monkey

A Chaos Monkey killing Kubernetes pods at random

Primary LanguageGoApache License 2.0Apache-2.0

kube-monkey

kube-monkey is a tool to test the resiliency of the system. It deletes random pods repeatedly at specific intervals.

Description

This is a tool inspired from the Chaos Monkey. This simply kills the random pods in the Kubernetes cluster. There are few ways to control which pods can be killed and at what intervals etc. Those are described below.

Dependencies and Building

The tool is written in go and uses official client-go library. And to expose the health check API it uses mux project.

Installing dependencies

Once you clone the repo, run the below command at the root of the repo.

make dep

This installs all the dependencies of the repo. Note that this project does not use the dependency management or vendoring yet. So the behaviour might be different for you. Dependency management will be added soon in future.

Buidling

To build a static binary for linux systems run the below command at the root if the repository.

make build

This creates a binary called kube-monkey at the root of the repository.

Deploy

This tool is built assuming that it would be running inside the kubernetes cluster as a pod. So it is important that the contianer is running inside a kubernetes pod for authenticating with the Kubernetes API server.

And since it discovers and deletes other pods, it needs to be running with proper serviceaccounts with required permissions. To create the required service accounts with required permissions, run the below command.

Please note that below command assumes that there is a Kubernetes cluster running and kubectl is configured to communicate with the cluster.

kubectl create -f k8s-deploy/rbac.yaml

Above comman creates below resources in the Kubernetes cluster

  1. A ClusterRole with the name kube-monkey
  2. A ServiceAccount with the name kube-monkey
  3. A ClusterRoleBinding binding these in namespace default

And then to deploy kube-monkey as a kubernetes deployment, run the below command. And note that the image is pulled from the docker repo msvbhat/kube-monkey. If you have built another docker image probaly with custom built binary, please update it in the file.

kubectl create -f k8s-deploy/kube-monkey.yaml

By default the 50% of the pods are killed every 2 minutes. The pods running in kube-system namespaces are whitelisted by default. To control this behaviour, please use the below env variables in the deployment manifest.

  1. NAMESPACE_WHITELIST - This is a space seperated list of Namespaces that should be whitelisted from killing pods. That means the pods running in these namespaces will not be considered for deleting. And the namespace kube-system is always whitelisted even if not specified in the list.

  2. DELETE_PERCENTAGE - This is the percentage of pods that should be deleted. To not delete any pod, specify 0 and to delete all pods specify 100. But note that this percentage is applied to the pods that are eligible for deletion i.e. this percentage is applied to the pods that are not running in whitelisted namespaces.

  3. KM_SCHEDULE - This is the schedule for kube monkey to delete pods. This follows the cron syntax. To understand more about the cron syntax that is allowed, please check docs

Considerations and Limitations

This has been only tested with the minikube. But is supposed to run in any Kubernetes cluster.

The project doesn't have unit tests yet. Unit tests will be added soon.

Currently the /metrics endpoint is a dummy endpoint. It doesn't return any metrics but only returns 200 OK.

Planned Enhancements

  1. Define metrics for exporting and export them through metrics endpoint
  2. Add sophisticated method of specifying pods with specific labels etc
  3. Also add blacklisting namespaces
  4. Send events to pods for visibility
  5. Use cli args instead of env variables
  6. More options to delete pods on specific nodes only etc