- Create a Kubernetes Cluster
- Deploy the Application
- Chaos Toolkit
- Install Chaos Tooklkit
- Experiment: Kill a Dependent Service (Greeting)
- Experiement: Kill a Dependent Service (Name)
- Experiment: Send 10 concurrent requests with different Name
- Experiment: Kill Nodes in one AZ
- Experiment: Kill Nodes in two AZ
- Experiment: Kill Masters in one AZ
- Experiment: Kill All Masters
- Cleanup
- Istio
- Gremlin
This repo shows how to do Chaos Testing with applications deployed in Kubernetes.
-
Install kops
brew update && brew install kops
-
Create an S3 bucket and setup
KOPS_STATE_STORE
:aws s3 mb s3://kubernetes-aws-io export KOPS_STATE_STORE=s3://kubernetes-aws-io
-
Define an envinronment variable for Availability Zones for the cluster:
export AWS_AVAILABILITY_ZONES="$(aws ec2 describe-availability-zones --query 'AvailabilityZones[].ZoneName' --output text | awk -v OFS="," '$1=$1')"
-
Create a 3 master and 6 worker cluster:
kops create cluster \ --name=chaos.k8s.local \ --master-count=3 \ --node-count=6 \ --zones=$AWS_AVAILABILITY_ZONES \ --yes
-
Verify the cluster and check that the masters are spread across multiple AZs.
$ kops validate cluster Using cluster from kubectl context: chaos.k8s.local Validating cluster chaos.k8s.local INSTANCE GROUPS NAME ROLE MACHINETYPE MIN MAX SUBNETS master-us-east-1a Master m3.medium 1 1 us-east-1a master-us-east-1b Master m3.medium 1 1 us-east-1b master-us-east-1c Master c4.large 1 1 us-east-1c nodes Node t2.medium 6 6 us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f NODE STATUS NAME ROLE READY ip-172-20-102-194.ec2.internal master True ip-172-20-110-113.ec2.internal node True ip-172-20-148-247.ec2.internal node True ip-172-20-182-151.ec2.internal node True ip-172-20-216-171.ec2.internal node True ip-172-20-42-110.ec2.internal node True ip-172-20-54-242.ec2.internal master True ip-172-20-78-200.ec2.internal master True ip-172-20-82-162.ec2.internal node True Your cluster chaos.k8s.local is ready
-
Check that the nodes are spread across multiple AZs:
aws ec2 describe-instances --query 'Reservations[].Instances[].Placement.AvailabilityZone'
The application is defined at https://github.com/aws-samples/aws-microservices-deploy-options.
Note
|
Make sure to install the specified version of Helm. Helm 2.9.0 is not able install charts. helm/helm#4001 has more details. |
-
Install the Helm CLI:
brew install kubernetes-helm
-
Switch to the version 2.8.1:
brew switch kubernetes-helm 2.8.1
-
Install Helm in Kubernetes cluster:
helm init
-
Create Service Account and provide permissions to deploy the application:
kubectl create serviceaccount --namespace kube-system tiller serviceaccount "tiller" created kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller clusterrolebinding "tiller-cluster-rule" created kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}' deployment "tiller-deploy" patched
-
Install the Helm chart:
helm install --name myapp charts/myapp
-
Access the application:
curl http://$(kubectl get svc/myapp-webapp -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
-
Later, delete the Helm chart:
helm delete --purge myapp
-
Populate the
WEBAPP_URL
environment variable with the URL of your cluster’swebapp-service
endpoint:export WEBAPP_URL="http://$(kubectl get svc/myapp-webapp -o jsonpath={.status.loadBalancer.ingress[0].hostname})/"
The output of the chaos run
command shows that the experiment was run but there is a weakness in the system. When the myapp-greeting
service is killed, the myapp-webapp
endpoint returns a response took greater than 3 seconds allowed. This is out of bounds of the tolerance for the system to be observed as still in steady-state.
TODO: How do I start the service back during rollback
?
-
Install Istio (complete details at Istio quick start):
curl -L https://git.io/getLatestIstio | sh - cd istio-* export PATH=$PWD/bin:$PATH kubectl apply -f install/kubernetes/istio.yaml
-
Inject Envoy proxy as sidecar in each pod:
How to inject Istio for an application deployed using Helm chart?