This repo contains several scripts that facilitate execution of R functions on AWS Lambda.
Currently (March 2018) it is not possible to run R code directly on AWS Lambda, thus we need to invoke it through Python.
The scripts:
- use your settings to create an AWS EC2 instance,
- install and compile R packages
- create the zip file to load in AWS Lambda and save it to S3
- create Lambda function and deploy the zip file
- configure AWS API Gateway to allow accessing the code over the web
At the end of the setup, you will have a AWS Lambda function that can be invoked as many times as you wish trough AWS API Gateway, without worrying about EC2 instances or scalability issues.
The best use case of this setup is
- almost unlimited scalability (1000 concurrent executions)
- no idle server time
- very low cost
- R functions are small and execute fast
- input and output through JSON strings
AWS Lambda and API Gateway impose several limitations
- maximum memory 3008MB
- this should be sufficient to run most functions
- maximum zip file size 250MB
- this is the most important limitation as it prevents using large R packages
- maximum execution time 30 seconds for API Gateway, 5 minutes for AWS Lambda
- be sure to take allow 1-2 sec for start time
The current setup assumes that the following directories and their content will be added to your R project directory.
lambda/
: a temporary directory with files to be uploadedpython/
: contains Python files, one for each AWS Lambda entry point, that will be used to invoke the R codescripts/
: the scripts compiling R packages and deploying to AWS Lambdasettings/
: settings files used for deployment (e.g. where to find AWS settings)
Directory doc/
contains additional documentation about how to setup for your
AWS account (although familiarity with AWS helps a lot) and how to delete the
setup created by these scripts.
- Install AWS CLI on your local machine
- Be sure that you stored your credentials in
~/.aws/
directory - Optionally, create a profile for AWS CLI with aws configure --profile
- Check that you can connect to your AWS account using the desired profile
aws sts get-caller-identity --profile aws-lambda-r
- Be sure that you stored your credentials in
- Prepare your project
- Ideally, the project directory name should contain only letters, dashes, and digits, e.g.
aws-lambda-r
- Be sure that git is initialized in the project directory (without git it
will be almost impossible to keep track of changes, especially in production)
git status
- Ideally, the project directory name should contain only letters, dashes, and digits, e.g.
- Copy directories
lambda/
,python/
,scripts/
,settings/
to your project directory - Copy and rename
setup_auto_example.sh
andsetup_user_example.sh
tosetup_auto.sh
andsetup_user.sh
- Overwrite variables from
secrets_default.sh
andsetup_default
with personal secrets insetup_user.sh
. Variables such asPRJ_NAME
,PRJ_BRANCH
,AWS_REGION
andEC2_DEFAULT_AMI_ID
fromsettings_default.sh
should be overwritten accordingly insetup_user.sh
. - For automated AWS infrastructure setup run first
21_setup_vpc.sh
,22_setup_custom_ami.sh
,23_setup_s3.sh
and24_setup_lambda.sh
, otherwise create the infrastructure manually, following the documentation.
Install the following packages, if not already installed:
$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$ brew update && brew install bash
- Add Bash 4 as the default shell:
$ sudo nano /etc/shells
# add to last line
/usr/local/bin/bash
# save and quit via ctrl + x
- md5sum:
$ brew install md5sha1sum
Load all the scripts via sudo bash ./scripts/<script_name>.sh
instead of
.scripts/<script_name>.sh
.
- Analyzing Genomics Data at Scale using R, AWS Lambda, and Amazon API Gateway
- Running R on AWS
- Lambda Execution Environment and Available Libraries
- AWS Lambda limits
- use AWS Cloud​Formation to create a template for all AWS config
- see "Running R on AWS"
- script to check AWS CLI is properly installed
- convert to an R package and execute the scripts from R
- use
/tmp
folder on AWS Lambda to load large libraries (e.g.,BH
)