Hugging Face Transformers is a popular open-source project that provides pre-trained, natural language processing (NLP) models for a wide variety of use cases. Customers with minimal machine learning experience can use pre-trained models to enhance their applications quickly using NLP. This includes tasks such as text classification, language translation, summarization, and question answering - to name a few.
Our solution consists of an AWS Cloud Development Kit (AWS CDK) script that automatically provisions container image-based Lambda functions that perform ML inference using pre-trained Hugging Face models. This solution also includes Amazon Elastic File System (EFS) storage that is attached to the Lambda functions to cache the pre-trained models and reduce inference latency.
In this architectural diagram:
- Serverless inference is achieved by using Lambda functions that are based on container image
- The container image is stored in an Amazon Elastic Container Registry (ECR) repository within your account
- Pre-trained models are automatically downloaded from Hugging Face the first time the function is invoked
- Pre-trained models are cached within Amazon Elastic File System storage in order to improve inference latency
The solution includes Python scripts for two common NLP use cases:
- Sentiment analysis: Identifying if a sentence indicates positive or negative sentiment. It uses a fine-tuned model on sst2, which is a GLUE task.
- Summarization: Summarizing a body of text into a shorter, representative text. It uses a Bart model that was fine-tuned on the CNN / Daily Mail dataset. For simplicity, both of these use cases are implemented using Hugging Face pipelines.
The following is required to run this example:
- git
- AWS CDK v2
- Python 3.6+
- A virtual env (optional)
- Clone the project to your development environment:
git clone <https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face.git>
- Install the required dependencies:
pip install -r requirements.txt
- Bootstrap the CDK. This command provisions the initial resources needed by the CDK to perform deployments:
cdk bootstrap
- This command deploys the CDK application to its environment. During the deployment, the toolkit outputs progress indications:
cdk deploy
The code is organized using the following structure:
├── inference
│ ├── Dockerfile
│ ├── sentiment.py
│ └── summarization.py
├── app.py
└── ...
The inference
directory contains:
- The
Dockerfile
used to build a custom image to be able to run PyTorch Hugging Face inference using Lambda functions - The Python scripts that perform the actual ML inference
The sentiment.py
script shows how to use a Hugging Face Transformers
model:
import json
from transformers import pipeline
nlp = pipeline("sentiment-analysis")
def handler(event, context):
response = {
"statusCode": 200,
"body": nlp(event['text'])[0]
}
return response
For each Python script in the inference directory, the CDK generates a Lambda function backed by a container image and a Python inference script.
The CDK script is named app.py
in the solution's repository. The
beginning of the script creates a virtual private cloud (VPC).
vpc = ec2.Vpc(self, 'Vpc', max_azs=2)
Next, it creates the EFS file system and an access point in EFS for the cached model:
fs = efs.FileSystem(self, 'FileSystem',
vpc=vpc,
removal_policy=RemovalPolicy.DESTROY)
access_point = fs.add_access_point('MLAccessPoint',
create_acl=efs.Acl(
owner_gid='1001', owner_uid='1001', permissions='750'),
path="/export/models",
posix_user=efs.PosixUser(gid="1001", uid="1001"))
It iterates through the Python files in the inference directory:
docker_folder = os.path.dirname(os.path.realpath(__file__)) + "/inference"
pathlist = Path(docker_folder).rglob('*.py')
for path in pathlist:
And then creates the Lambda function that serves the inference requests:
base = os.path.basename(path)
filename = os.path.splitext(base)[0]
# Lambda Function from docker image
function = lambda_.DockerImageFunction(
self, filename,
code=lambda_.DockerImageCode.from_image_asset(docker_folder,
cmd=[filename+".handler"]),
memory_size=8096,
timeout=Duration.seconds(600),
vpc=vpc,
filesystem=lambda_.FileSystem.from_efs_access_point(
access_point, '/mnt/hf_models_cache'),
environment={
"TRANSFORMERS_CACHE": "/mnt/hf_models_cache"},
)
Optionally, you can add more models by adding Python scripts in the
inference directory. For example, add the following code in a file
called translate-en2fr.py
:
import json
from transformers
import pipeline
en_fr_translator = pipeline('translation_en_to_fr')
def handler(event, context):
response = {
"statusCode": 200,
"body": en_fr_translator(event['text'])[0]
}
return response
Then run:
$ cdk synth
$ cdk deploy
This creates a new endpoint to perform English to French translation.
After you are finished experimenting with this project, run cdk destroy
to remove all of the associated infrastructure.
This library is licensed under the MIT No Attribution License. See the LICENSE file. Disclaimer: Deploying the demo applications contained in this repository will potentially cause your AWS Account to be billed for services.