/Simmy

Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET

Primary LanguageC#OtherNOASSERTION

Simmy

Simmy is a chaos-engineering and fault-injection tool, integrating with the Polly resilience project for .NET. It is releasing April 2019 and works with Polly v7.0.0 onwards.

Simmy allows you to introduce a chaos-injection policy or policies at any location where you execute code through Polly.

NuGet version Build status Slack Status

Simmy

Motivation

There are a lot of questions when it comes to chaos-engineering and making sure that a system is actually ready to face the worst possible scenarios:

  • Is my system resilient enough?
  • Am I handling the right exceptions/scenarios?
  • How will my system behave if X happens?
  • How can I test without waiting for a handled (or even unhandled) exception to happen in my production environment?

Using Polly helps me introduce resilience to my project, but I don't want to have to wait for expected or unexpected failures to test it out. My resilience could be wrongly implemented; testing the scenarios is not straight forward; and mocking failure of some dependencies (for example a cloud SaaS or PaaS service) is not always straightforward.

What do I need, to simulate chaotic scenarios in my production environment?

  • A way to mock failures of dependencies (any service dependency for example).
  • Define when to fail based on some external factors - maybe global configuration or some rule.
  • A way to revert easily, to control the blast radius.
  • Production grade, to run this in a production or near-production system with automation.

Chaos policies

Simmy offers the following chaos-injection policies:

Policy What does the policy do?
Exception Injects exceptions in your system.
Result Substitute results to fake faults in your system.
Latency Injects latency into executions before the calls are made.
Behavior Allows you to inject any extra behaviour, before a call is placed.

Usage

Step 1: Set up the Monkey Policy

Inject exception

var chaosPolicy = MonkeyPolicy.InjectException(Action<InjectOutcomeOptions<Exception>>);

For example:

// Following example causes the policy to throw SocketException with a probability of 5% if enabled
var fault = new SocketException(errorCode: 10013);
var chaosPolicy = MonkeyPolicy.InjectException(with =>
	with.Fault(fault)
		.InjectionRate(0.05)
		.Enabled()
	);

Inject result

var chaosPolicy = MonkeyPolicy.InjectResult(Action<InjectOutcomeOptions<TResult>>);

For example:

// Following example causes the policy to return a bad request HttpResponseMessage with a probability of 5% if enabled
var result = new HttpResponseMessage(HttpStatusCode.BadRequest);
var chaosPolicy = MonkeyPolicy.InjectResult<HttpResponseMessage>(with =>
	with.Result(result)
		.InjectionRate(0.05)
		.Enabled()
);

Inject latency

var chaosPolicy = MonkeyPolicy.InjectLatency(Action<InjectLatencyOptions>);

For example:

// Following example causes policy to introduce an added latency of 5 seconds to a randomly-selected 10% of the calls.
var isEnabled = true;
var chaosPolicy = MonkeyPolicy.InjectLatency(with =>
	with.Latency(TimeSpan.FromSeconds(5))
		.InjectionRate(0.1)
		.Enabled(isEnabled)
	);

Inject behavior

var chaosPolicy = MonkeyPolicy.InjectBehaviour(Action<InjectBehaviourOptions>);

For example:

// Following example causes policy to execute a method to restart a virtual machine; the probability that method will be executed is 1% if enabled
var chaosPolicy = MonkeyPolicy.InjectBehaviour(with =>
	with.Behaviour(() => restartRedisVM())
		.InjectionRate(0.01)
		.EnabledWhen((ctx, ct) => isEnabled(ctx, ct))
	);

Parameters

All the parameters are expressed in a Fluent-builder syntax way.

Enabled

Determines whether the policy is enabled or not.

  • Configure that the monkey policy is enabled.
PolicyOptions.Enabled();
  • Receives a boolean value indicating whether the monkey policy is enabled.
PolicyOptions.Enabled(bool);
  • Receives a delegate which can be executed to determine whether the monkey policy should be enabled.
PolicyOptions.EnabledWhen(Func<Context, CancellationToken, bool>);

InjectionRate

A decimal between 0 and 1 inclusive. The policy will inject the fault, randomly, that proportion of the time, eg: if 0.2, twenty percent of calls will be randomly affected; if 0.01, one percent of calls; if 1, all calls.

  • Receives a double value between [0, 1] indicating the rate at which this monkey policy should inject chaos.
PolicyOptions.InjectionRate(Double);
  • Receives a delegate which can be executed to determine the rate at which this monkey policy should inject chaos.
PolicyOptions.InjectionRate(Func<Context, CancellationToken, Double>);

Fault

The fault to inject. The Fault api has overloads to build the policy in a generic way: PolicyOptions.Fault<TResult>(...)

  • Receives an exception to configure the fault to inject with the monkey policy.
PolicyOptions.Fault(Exception);
  • Receives a delegate representing the fault to inject with the monkey policy.
PolicyOptions.Fault(Func<Context, CancellationToken, Exception>);

Result

The result to inject.

  • Receives a generic TResult value to configure the result to inject with the monkey policy.
PolicyOptions.Result<TResult>(TResult);
  • Receives a delegate representing the result to inject with the monkey policy.
PolicyOptions.Result<TResult>(Func<Context, CancellationToken, TResult>);

Latency

The latency to inject.

  • Receives a TimeSpan value to configure the latency to inject with the monkey policy.
PolicyOptions.Latency(TimeSpan);
  • Receives a delegate representing the latency to inject with the monkey policy.
PolicyOptions.Latency(Func<Context, CancellationToken, TimeSpan>);

Behaviour

The behaviour to inject.

  • Receives an Action to configure the behaviour to inject with the monkey policy.
PolicyOptions.Behaviour(Action);
  • Receives a delegate representing the Action to inject with the monkey policy.
PolicyOptions.Behaviour(Action<Context, CancellationToken>);

Context-driven behaviour

All parameters are available in a Func<Context, ...> form. This allows you to control the chaos injected:

  • in a dynamic manner: by eg driving the chaos from external configuration files
  • in a targeted manner: by tagging your policy executions with a Context.OperationKey and introducing chaos targeting particular tagged operations

The example app demonstrates both these approaches in practice.

Step 2: Execute code through the Monkey Policy

// Executes through the chaos policy directly
chaosPolicy.Execute(() => someMethod());

// Executes through the chaos policy using Context
chaosPolicy.Execute((ctx) => someMethod(), context);

// Wrap the chaos policy inside other Polly resilience policies, using PolicyWrap
var policyWrap = Policy
  .Wrap(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
policyWrap.Execute(() => someMethod())

// All policies are also available in async forms.
var chaosLatencyPolicy = MonkeyPolicy.InjectLatencyAsync(with =>
	with.Latency(TimeSpan.FromSeconds(5))
		.InjectionRate(0.1)
		.Enabled()
	);
var policyWrap = Policy
  .WrapAsync(fallbackPolicy, timeoutPolicy, chaosLatencyPolicy);
var result = await policyWrap.ExecuteAsync(token => service.GetFoo(parametersBar, token), myCancellationToken);

// For general information on Polly policy syntax see: https://github.com/App-vNext/Polly

It is usual to place the Simmy policy innermost in a PolicyWrap. By placing the chaos policies innermost, they subvert the usual outbound call at the last minute, substituting their fault or adding extra latency. The existing Polly policies - further out in the PolicyWrap - still apply, so you can test how the Polly resilience you have configured handles the chaos/faults injected by Simmy.

Note: The above examples demonstrate how to execute through a Simmy policy directly, and how to include a Simmy policy in an individual PolicyWrap. If your policies are configured by .NET Core DI at StartUp, for example via HttpClientFactory, there are also patterns which can configure Simmy into your app as a whole, at StartUp. See the Simmy Sample App discussed below.

Example app: Controlling chaos via configuration and Polly.Context

This Simmy sample app shows different approaches/patterns for how you can configure Simmy to introduce chaos policies in a project. Patterns demonstrated are:

  • Configuring StartUp so that Simmy chaos policies are only introduced in builds for certain environments (for instance, Dev but not Prod).
  • Configuring Simmy chaos policies to be injected into the app without changing any existing Polly configuration code.
  • Injecting faults or chaos by modifying external configuration.

The patterns shown in the sample app are intended as starting points but are not mandatory. Simmy is very flexible, and we would love to hear how you use it!

Wrapping up

All chaos policies (Monkey policies) are designed to inject behavior randomly (faults, latency or custom behavior), so a Monkey policy allows you to specify an injection rate between 0 and 1 (0-100%) thus, the higher is the injection rate the higher is the probability to inject them. Also it allows you to specify whether or not the random injection is enabled, that way you can release/hold (turn on/off) the monkeys regardless of injection rate you specify, it means, if you specify an injection rate of 100% but you tell to the policy that the random injection is disabled, it will do nothing.

Further information

See Issues for latest discussions on taking Simmy forward!

Credits

Simmy was the brainchild of @mebjas and @reisenberger. The major part of the implementation was by @vany0114 and @mebjas, with contributions also from @reisenberger of the Polly team.

Blogs and architecture samples around Simmy

Blog posts

Samples