/cdk-stepfunctions-patterns

A set of Step Functions high-level resiliency patterns.

Primary LanguageTypeScriptMIT LicenseMIT

cdk-stepfunctions-patterns Mentioned in Awesome CDK

build jsii-publish downloads

npm version PyPI version NuGet version

cdk-stepfunctions-patterns library is a set of AWS CDK constructs that provide resiliency patterns implementation for AWS Step Functions.

All these patterns are composable, meaning that you can combine them together to create quite complex state machines that are much easier to maintain and support than low-level JSON definitions.

Try / Catch pattern

Step Functions support Try / Catch pattern natively with Task and Parallel states.

TryTask construct adds a high level abstraction that allows you to use Try / Catch pattern with any state or sequence of states.

Example

import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';

// ...

new sfn.StateMachine(this, 'TryCatchStepMachine', {
  definition: new TryTask(this, "TryCatch", {
    tryProcess: new sfn.Pass(this, 'A1').next(new sfn.Pass(this, 'B1')),
    catchProcess: new sfn.Pass(this, 'catchHandler'),
    // optional configuration properties
    catchProps: {
      errors: ['Lambda.AWSLambdaException'],
      resultPath: "$.ErrorDetails"
    }
  })
})

Resulting StepFunction

Try / Finally pattern

It is often useful to design state machine using Try / Finally pattern. The idea is to have a Final state that has to be executed regardless of successful or failed execution of the Try state. There may be some temporal resource you want to delete or notification to send.

Step Functions do not provide a native way to implement that pattern but it can be done using Parallel state and catch all catch specification.

TryTask construct abstracts these implementation details and allows to express the pattern directly.

Example

import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';

// ...

new sfn.StateMachine(this, 'TryFinallyStepMachine', {
    definition: new TryTask(this, "TryFinally", {
        tryProcess: new sfn.Pass(this, 'A2').next(new sfn.Pass(this, 'B2')),
        finallyProcess: new sfn.Pass(this, 'finallyHandler'),
        // optional configuration properties
        finallyErrorPath: "$.FinallyErrorDetails"
    })
})

Resulting StepFunction

Try / Catch / Finally pattern

This is a combination of two previous patterns. TryTask construct allows you to express rather complex error handling logic in a very compact form.

Example

import * as sfn from '@aws-cdk/aws-stepfunctions';
import { TryTask } from 'cdk-stepfunctions-patterns';

// ...

new sfn.StateMachine(this, 'TryCatchFinallyStepMachine', {
    definition: new TryTask(this, "TryCatchFinalli", {
        tryProcess: new sfn.Pass(this, 'A3').next(new sfn.Pass(this, 'B3')),
        catchProcess: new sfn.Pass(this, 'catchHandler3'),
        finallyProcess: new sfn.Pass(this, 'finallyHandler3')
    })
})

Resulting StepFunction

Retry with backoff and jitter

Out of the box Step Functions retry implementation provides a way to configure backoff factor, but there is no built in way to introduce jitter. As covered in Exponential Backoff And Jitter and Wait and Retry with Jittered Back-off this retry technique can be very helpful in high-load scenarios.

RetryWithJitterTask construct provides a custom implementation of retry with backoff and jitter that you can use directly in your state machines.

Example

import * as sfn from '@aws-cdk/aws-stepfunctions';
import { RetryWithJitterTask } from 'cdk-stepfunctions-patterns';

// ...

new sfn.StateMachine(this, 'RetryWithJitterStepMachine', {
    definition: new RetryWithJitterTask(this, "AWithJitter", {
        tryProcess: new sfn.Pass(this, 'A4').next(new sfn.Pass(this, 'B4')),
        retryProps: { errors: ["States.ALL"], maxAttempts: 3 }
    })
})

Resulting StepFunction

Resilience lambda errors handling

LambdaInvoke construct from aws-stepfunctions-tasks module is probably one of the most used ones. Still, handling of AWS Lambda service exceptions is often overlooked.

ResilientLambdaTask is a drop-in replacement construct for LambdaInvoke that adds retry for the most common transient errors:

  • Lambda.ServiceException
  • Lambda.AWSLambdaException
  • Lambda.SdkClientException
  • Lambda.TooManyRequestsException

Example

import * as lambda from '@aws-cdk/aws-lambda';
import { ResilientLambdaTask } from 'cdk-stepfunctions-patterns';

// ...

const lambdaFunction = new lambda.Function(this, 'LambdaFunction', {
    // ... removed for clarity
});

const calculateJitterTask = new ResilientLambdaTask(this, "InvokeLambda", {
    lambdaFunction: lambdaFunction  
})

That would result in the following state definition:

"InvokeLambda": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke",
    "Parameters": {
        "FunctionName": "<ARN of lambda function>"
    },
    "Retry": [{
        "ErrorEquals": [
        "Lambda.ServiceException",
        "Lambda.AWSLambdaException",
        "Lambda.SdkClientException",
        "Lambda.TooManyRequestsException"
        ],
        "IntervalSeconds": 2,
        "MaxAttempts": 6,
        "BackoffRate": 2
    }]
}

Validation of proper resilience lambda errors handling

It is often a challenge to enforce consistent transient error handling across all state machines of a large application. To help with that, cdk-stepfuctions-patterns provides a CDK aspect to verify that all Lambda invocations correctly handle transient errors from AWS Lambda service.

Use ResilienceLambdaChecker aspect as shown below.

Example

import * as cdk from '@aws-cdk/core';
import { ResilienceLambdaChecker } from 'cdk-stepfunctions-patterns'

const app = new cdk.App();
// ...

// validate compliance rules
app.node.applyAspect(new ResilienceLambdaChecker());

If there are some states in your application that do not retry transient errors or miss some recommended error codes, there will be warning during CDK synthesize stage:

PS C:\Dev\GitHub\cdk-stepfunctions-patterns> cdk synth --strict
[Warning at /StepFunctionsPatterns/A] No retry for AWS Lambda transient errors defined - consider using ResilientLambdaTask construct.
[Warning at /StepFunctionsPatterns/B] Missing retry for transient errors: Lambda.AWSLambdaException,Lambda.SdkClientException.