Chaos Workshop

Welcome to the Chaos Workshop!! Follow the steps provided below to successfully complete the chaos workshop. Earn your certificate & win prizes by sharing the workshop completion details on the #chaos-carnival channel in the Harness Community Slack Workspace !!

To catch the workshop steps live, join the session during Day-1 of Chaos Carnival (March 15, 13:45 CDT) or refer to this Recording.

Prerequisites for the Chaos Workshop

Sign-Up on Harness SaaS Platform

Sign-Up on the Harness SaaS platform via email.
Click on the verification email received.
Choose the Chaos Engineering Module. This will enable a 14-day enterprise trial license.

Note: Note the Account ID (underlined in red in the above screenshot). This will be needed while submitting request for the sandbox environment.
You will see a modal asking to to "Enable Chaos Infrastructure To Run Your First Chaos Experiment". At this point, pause action on the Harness UI & proceed with the next step to obtain the sandbox environment.

Obtain a Sandbox Environment to Run Chaos Experiments

Fill up and submit this form to request a timed (6 hours) sandbox environment to carry out the chaos workshop steps
In under 2 minutes, check your email and verify receipt of your sandbox config information, which consits of:
- A KubeConfig file, which you can download and use as the context for navigating the environment provided
  - Note: You will need kubectl setup in your local workspace to view the resources in your sandbox environment (which is a Kubernetes Namespace bearing the name firstname-lastname-ns).
- URLs to a sample microservices application (which will be subjected to chaos during the workshop), the grafana dashboard where it is monitored along with the corresponding prometheus endpoint.

Note: If you don't receive the email containing access info to the sandbox environment within 5 minutes, please send an email to adarsh.kumar@harness.io, karthik.s@harness.io OR reach out on the #chaos-carnival channel on the Harness Community Slack Workspace

Setup The Chaos Infrastructure

Now, proceed with setting up of the chaos infrastructure on the "default project". You can create a dedicated/new project if you wish.
You will be needed to create a new "Environment", configure your chaos infrastructure, download the installation manifest for the chaos infra and apply it in the provided sandbox environment. The detailed set of steps to achieve this can be found here: https://developer.harness.io/docs/chaos-engineering/user-guides/connect-chaos-infrastructures

Notes:

Please select the "Namespace Mode" option for chaos infrastructure and provide the appropriate namespace name (use the namespace provided as part of your sandbox environment instead of the default hce).
Ignore the instructions to create namespace and to apply the CRDs (These steps are already performed for you as part of the sandbox env creation)
Selection of "Cluster Wide" option can result in failure, it is strictly unsupported for this workshop.

Connect The Custom ChaosHub

The project shall contain the default "Enterprise ChaosHub" which consists of all the supported faults. However, to simplify things, we have a dedicated custom chaos artifact source for this workshop.
Add a new chaoshub by following the steps outlined here: https://developer.harness.io/docs/chaos-engineering/user-guides/add-chaos-hub by using the GitHub repo URL https://github.com/chaoscarnival/hub-workshop-2023
Browse the newly added chaoshub. You will see 4 chaos experiments ready to be launched.

Online-Boutique: A Summary of the Chaos Experimentation Activity

The workshop details chaos experiments against (an instrumented version of) the demo microservices application Online Boutique. The application is simulated to be constantly under "usage" via a load generator component.
The experiments involve injection of different types of chaos faults on a given microservice (ex: carts) OR multiple microservices accompanied by validation of specific constraints (hypotheses) around application behaviour and user impact.

Steps to launch chaos experiments from the ChaosHub & view its progress are outlined here: https://developer.harness.io/docs/chaos-engineering/user-guides/construct-and-run-custom-chaos-experiments#launch-an-experiment-from-chaos-hub
The chaos experiment progress, its logs and eventually, the results can be viewed on the respective overview page, while real-time impact can be observed on the Grafana dashboard

Launch Experiment #1 (State Chaos): `boutique-carts-pod-bounce`

UseCase

In this experiment, we randomly bounce/delete pods belonging to the carts microservice. The intent of state chaos such as this is to verify impact upon pod kills that occur as a result of evictions, upgrades etc.,
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)

Activity

Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate App Namespace (namespace corresponding to the sandbox env) and App Label (app=cartservice) in the Target Application section. Proceed to run the chaos experiment.

Expected Result

While the resource health is maintained before & after the experiment, the availability and performance constraints are not met, leading to probe failures and hence, a low resilience score.

Launch Experiment #2 (Network Chaos): `boutique-carts-degraded-network`

UseCase

In this experiment, we inject network latency (with jitter, to randomize extent of latency) to the carts microservice to simulate a degraded cluster network. This is also one of the most popular ways to simulate latency between services across AZs/regions. The intent is to evaluate if the network delay is handled within the system OR is propagated upwards to cause degraded user experience on the website's transactions.
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)

Activity

Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate App Namespace (namespace corresponding to the sandbox env) and App Label (app=cartservice)in the Target Application section. Proceed to run the chaos experiment.

Expected Result

While the resource health is maintained before & after the experiment and the website continues to be available, the performance constraints are not met, leading to probe failure and hence, a low resilience score.

Launch Experiment #3 (Resource Chaos): `boutique-carts-cpu-starvation`

UseCase

In this experiment, we hog the cpu resources in the pod belonging to the carts microservice, simulating a high-traffic situation in which the service is deprived of cpu cycles, leading to slower responses. The intent is to evaluate whether the slowness is handled within the system OR is propagated upwards to cause degraded user experience on the website's transactions.
During this experiment, we validate the following hypotheses/constraints using "Resilience Probes":
- Healthy Kubernetes resource status prior to and after fault injection
- Continuous availability of the microservice under test
- Expected levels of latency on the website upon user actions (simulated via loadgenerator)

Activity

Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate App Namespace (namespace corresponding to the sandbox env) and App Label (app=cartservice) in the Target Application section. Proceed to run the chaos experiment.

Expected Result

While the resource health is maintained before & after the experiment and the website continues to be available, the performance constraints are not met, leading to probe failure and hence, a low resilience score.

Launch Experiment #4 (Multi-Fault Chaos): `boutique-multi-fault-scenario`

UseCase

In this experiment, we illustrate the ability to string faults together in a desired fashion to generate complex scenarios that reproduce past outage conditions OR are used as stressors/mechanisms to evaluate multi-component failure.

Activity

Upon "Launch Experiment", select the appropriate chaos infrastructure (connected in the previous steps) & provide the appropriate App Namespace (namespace corresponding to the sandbox env) and App Label (app=cartservice, app=paymentservice, app=adservice, respectively) in the Target Application section of each individual faults. Proceed to run the chaos experiment.

Expected Result

This experiment is oriented towards illustrating multi-fault capabilities. The probe successes/failures are aligned with the ones explained in the previous runs.

Share Your Workshop Results With The Harness Team

Share Screenshots of the "Chaos Experiment Overview Page" for all 4 chaos experiments in the #chaos-carnival channel of the Harness Community Slack Workspace

Note: Please ensure that the screenshots cover the browser address bar with account ID in the URL!

chaoscarnival/chaos-workshop-guide-2023

Chaos Workshop

Prerequisites for the Chaos Workshop

Sign-Up on Harness SaaS Platform

Obtain a Sandbox Environment to Run Chaos Experiments

Setup The Chaos Infrastructure

Connect The Custom ChaosHub

Online-Boutique: A Summary of the Chaos Experimentation Activity

Launch Experiment #1 (State Chaos): boutique-carts-pod-bounce

UseCase

Activity

Expected Result

Launch Experiment #2 (Network Chaos): boutique-carts-degraded-network

UseCase

Activity

Expected Result

Launch Experiment #3 (Resource Chaos): boutique-carts-cpu-starvation

UseCase

Activity

Expected Result

Launch Experiment #4 (Multi-Fault Chaos): boutique-multi-fault-scenario

UseCase

Activity

Expected Result

Share Your Workshop Results With The Harness Team

Launch Experiment #1 (State Chaos): `boutique-carts-pod-bounce`

Launch Experiment #2 (Network Chaos): `boutique-carts-degraded-network`

Launch Experiment #3 (Resource Chaos): `boutique-carts-cpu-starvation`

Launch Experiment #4 (Multi-Fault Chaos): `boutique-multi-fault-scenario`