/lambda-refarch-fileprocessing

AWS Lambda Reference Architecture for Real-time File Processing

Primary LanguageJavaScriptApache License 2.0Apache-2.0

AWS Lambda Reference Architecture: Real-time File Processing

The Real-time File Processing reference architecture is a general-purpose, event-driven, parallel data processing architecture that utilizes AWS Lambda. This architecture is ideal for workloads that need more than one data derivative of an object. This simple architecture is described in this diagram and blog post. This sample applicaton demonstrates a Markdown conversion application where Lambda is used to convert Markdown files to HTML and plain text.

Running the Example

The provided AWS CloudFormation template can be used to launch a stack that demonstrates the Lambda file processing reference architecture. Detailed information about the this template can be found in the CloudFormation Template Details section below.

Important: Because the AWS CloudFormation stack name is used in the name of the S3 buckets, that stack name must only contain lowercase letters. Please use lowercase letters when typing the stack name. The provided CloudFormation template retreives its Lambda code from a bucket in the us-east-1 region. To launch this sample in another region, please modify the template and upload the Lambda code to a bucket in that region.

Use the button below to launch the stack via the AWS Console.

Launch into Lambda ETL into North Virginia with CloudFormation

Alternatively, you can use the following command to launch the stack using the AWS CLI. This assumes you have already installed the AWS CLI.

aws cloudformation create-stack \
    --stack-name lambda-file-processing \
    --template-url https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/lambda_file_processing.template \
    --capabilities CAPABILITY_IAM

Testing

Once you have created the stack using the provided template, you can test the system by uploading a Markdown file to the InputBucket that was created in the stack. The README.md file in this repository can be used as an example file. Once the file has been uploaded, you can see the resulting HTML and plain text files in the output bucket of your stack. You can also view the CloudWatch logs for each of the functions in order to see the details of their execution.

You can use the following commands to copy a sample file from the provided S3 bucket into the input bucket of your stack.

BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-processing --logical-resource-id InputBucket --query "StackResourceDetail.PhysicalResourceId" --output text)
aws s3 cp s3://awslambda-reference-architectures/file-processing/example.md s3://$BUCKET/example.md

After the file has been uploaded to the input bucket you can inspect the output bucket to see the rendered HTML and plain text output files created by the Lambda functions.

You can also view the CloudWatch logs generated by the Lambda functions.

Cleaning Up

To tear down the deployed resources you must complete the following steps:

  1. Delete all objects in the input and output buckets.
  2. Delete the CloudFormation stack.
  3. Delete the CloudWatch Log groups that contain the execution logs for the two processor functions.

CloudFormation Template Resources

Parameters

  • CodeBucket: Name of the S3 bucket in the stack's region that contains the code for the two Lambda functions, ProcessorFunctionOne and ProcessorFunctionTwo. Defaults to the managed bucket 'awslambda-reference-architectures'.

  • CodeKeyPrefix: The key prefix for the Lambda function code relative to CodeBucket. Defaults to 'file-processing'.

Resources

The provided template creates the following resources:

  • InputBucket: An Amazon Simple Storage Service (Amazon S3) bucket that holds the raw Markdown files. Uploading a file to this bucket will trigger both processing functions.

  • OutputBucket: An Amazon S3 bucket that is populated by the processor functions with the transformed files.

  • InputNotificationTopic: An Amazon Simple Notification Service (Amazon SNS) topic used to invoke multiple Lambda functions in response to each object creation notification.

  • NotificationPolicy: An Amazon SNS topic policy which permits InputBucket to call the Publish action on the topic.

  • ProcessorFunctionOne: An AWS Lambda function that converts Markdown files to HTML. The deployment package for this function must be located at s3://[CodeBucket]/[CodeKeyPrefix]/data-processor-1.zip.

  • ProcessorFunctionTwo: An AWS Lambda function that converts Markdown files to plain text. The deployment package for this function must be located at s3://[CodeBucket]/[CodeKeyPrefix]/data-processor-2.zip.

  • LambdaExecutionRole: An AWS Identity and Access Management (IAM) role used by the two Lambda functions.

  • RolePolicy: An IAM policy associated with LambdaExecutionRole that allows the functions to get objects from InputBucket, put object to OutputBucket and log to Amazon CloudWatch.

  • LambdaInvokePermissionOne: A policy that enables Amazon SNS to invoke ProcessorFunctionOne based on notifications from InputNotificationTopic.

  • LambdaInvokePermissionTwo: A policy that enables Amazon SNS to invoke ProcessorFunctionTwo based on notifications from InputNotificationTopic.

License

This reference architecture sample is licensed under Apache 2.0.