AWS Lambda Duplex Stream to S3

An example of how to stream large files to S3 from an external source on AWS Lambda using the JavaScript AWS SDK v3

Motivation

In use cases where lambda needs to download and upload files that can range widely in size, using a standard up-front download and upload flow will require increasing the lambda memory configuration to be large enough to store the file while the transfer occurs.

By using an approach with streams, the lambda memory configuration is no longer a constraint, allowing us to process files of any size as long as the transfer can occur within the maximum execution time (15 minutes).

Prerequisites

To deploy this CDK application, you will need the following:

Node.js v10 or later (LTS only)
Docker (for building the lambda function)
An AWS profile with valid IAM credentials

Usage

Deploy the CDK stack by cloning this repository then running:

npm run build && npm run deploy

Run the invoke.sh script with a URL of large file (will take some time while the lambda pulls in the file and uploads):

./invoke.sh "https://ai2-public-datasets.s3.us-west-2.amazonaws.com/arc/ARC-V1-Feb2018.zip" # 649 MB file

Your lambda will then stream the incoming data from your URL while streaming the data to S3.

Open the S3 Console when the lambda has completed to see the file has been uploaded to your bucket.

Note: If you need to specify an AWS profile other than default to use, set the AWS_PROFILE environment variable in your shell before running any commands.

Cleaning Up

Run the following command to remove all of the resources created by this CDK stack.

npm run destroy

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.