/aws-serverless-for-machine-learning-inference

This is a sample solution for bringing your own ML models and inference code, and running them at scale using AWS serverless services.

Primary LanguageTypeScriptMIT No AttributionMIT-0

Machine learning inference at scale using AWS serverless

This sample solution shows you how to run and scale ML inference using AWS serverless services: AWS Lambda and AWS Fargate. This is demonstrated using an image classification use case.

Architecture

The following diagram illustrates the solutions architecture for both batch and real-time inference options.

architecture

Deploying the solution

To deploy and run the solution, you need access to:

To deploy the solution, open your terminal window and complete the following steps.

  1. Clone the GitHub repo
    git clone https://github.com/aws-samples/aws-serverless-for-machine-learning-inference.git

  2. Navigate to the project directory and deploy the CDK application.
    ./install.sh
    or
    ./cloud9_install.sh #If you are using AWS Cloud9
    Enter Y to proceed with the deployment.

Running inference

The solution lets you get predictions for either a set of images using batch inference or for a single image at a time using real-time API end-point.

Batch inference

Get batch predictions by uploading image files to Amazon S3.

  1. Upload one or more image files to the S3 bucket path, ml-serverless-bucket--/input, from Amazon S3 console or using AWS CLI.
    aws s3 cp <path to jpeg files> s3://ml-serverless-bucket-<acct-id>-<aws-region>/input/ --recursive
  2. This will trigger the batch job, which will spin-off Fargate tasks to run the inference. You can monitor the job status in AWS Batch console.
  3. Once the job is complete (this may take a few minutes), inference results can be accessed from the ml-serverless-bucket--/output path

Real-time inference

Get real-time predictions by invoking the API endpoint with an image payload.

  1. Navigate to the CloudFormation console and find the API endpoint URL (httpAPIUrl) from the stack output.
  2. Use a REST client, like Postman or curl command, to send a POST request to the /predict api endpoint with image file payload.
    curl --request POST -H "Content-Type: application/jpeg" --data-binary @<your jpg file name> <your-api-endpoint-url>/predict
  3. Inference results are returned in the API response.

Cleaning up

Navigate to the project directory from the terminal window and run the following command to destroy all resources and avoid incurring future charges.
cdk destroy

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.