/sla-monitor-store-results-lambda

Subscribes to the SLA runner events and persists them to CloudWatch custom metrics

Primary LanguageJavaScript

SLA Monitor Store Results Lambda

This is the Lambda side of the SLA Monitor project. It is the second step after the SLA Monitor Runner. The purpose of this is to launch a stack via serverless that will subscribe an SQS queue to the SNS topic created by the Runner terraform, and process the testing results to a persisted state. Primarily, this will publish custom Cloudwatch metrics to measure uptime.

Two metrics are created automatically:

  • Pass and fail: 1s denote failure, making aggregated checks easy to manage.
  • Execution duration of integration testing over time

A Cloudwatch metric alert is also created out of the box.

Using

The SLA Runner publishes an SNS message containing the following. We will subscribe this SNS to SQS and then have this Lambda consume that SQS queue to process these messages.

{
    "service": "example-service",
    "group": ["dev-team", "critical"], # Send Data for failures; Over 0 marks downtime.
    "succeeded": true,
    "timestamp": "1574533200",
    "testExecutionSecs": "34" 
}

As an example, this message would be published as custom cloudwatch metrics with these values:

{ 
    "MetricData": [
        {
            "MetricName": "integration-sla-success",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-failure",
            "Value": 0,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-attempts",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "sla-test-duration-secs",
            "Value": "31",
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-success",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "dev-team"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-failure",
            "Value": 0,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "dev-team"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-attempts",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "dev-team"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-success",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "critical"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-failure",
            "Value": 0,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "critical"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
        {
            "MetricName": "integration-sla-attempts",
            "Value": 1,
            "Timestamp": "2019-04-02T15:44:26.329Z",
            "StorageResolution": "60",
            "Dimensions": [
                {
                    "Name": "Group",
                    "Value": "critical"
                },
                {
                    "Name": "Region",
                    "Value": "us-east-1"
                },
                {
                    "Name": "Service",
                    "Value": "example-service"
                }
            ],
            "Unit": "Count"
        }
    ],
    "Namespace": "SLA-Monitor"
}

Testing

docker build -t sla-monitor-lambda .

export AWS_ENV="dev" && \
iam-docker-run \
    --image sla-monitor-lambda \
    --profile $AWS_ENV \
    --full-entrypoint '/bin/bash ./invokeLocal.sh'

Continuous data:

export AWS_ENV="dev" && \
watch -n 5 "iam-docker-run \
    --image sla-monitor-lambda \
    --profile $AWS_ENV \
    --full-entrypoint '/bin/bash ./invokeLocal.sh'

If you want to mount the folder in instead of building every time, you can add these lines

    --host-source-path . \
    --container-source-path /app

However, be aware that unless you run "npm install" before running invoke, you will be missing dependencies.

Deploying

This Lambda relies on an SNS topic to have been created as part of the SLA Monitor Runner project. First execute this Terraform by following these instructions:

https://github.com/billtrust/sla-monitor-runner#terraform

Then the SLA Monitor Store Results Lambda can be deployed by the following:

docker build -t sla-monitor-lambda .

export AWS_ENV="dev" && \
export DEPLOY_BUCKET='company-deploy' && \
iam-docker-run \
    --image sla-monitor-lambda \
    --profile $AWS_ENV \
    --full-entrypoint "sls deploy --deployBucket $DEPLOY_BUCKET"