Hugging Face Inference SageMaker Module

Terraform module for easy deployment of a Hugging Face Transformer models to Amazon SageMaker real-time endpoints. This module will create all the necessary resources to deploy a model to Amazon SageMaker including IAM roles, if not provided, SageMaker Model, SageMaker Endpoint Configuration, SageMaker endpoint.

With this module you can deploy Hugging Face Transformer directly from the Model Hub or from Amazon S3 to Amazon SageMaker for PyTorch and Tensorflow based models.

Usage

basic example

module "sagemaker-huggingface" {
  source               = "philschmid/sagemaker-huggingface/aws"
  version              = "0.5.0"
  name_prefix          = "distilbert"
  pytorch_version      = "1.9.1"
  transformers_version = "4.12.3"
  instance_type        = "ml.g4dn.xlarge"
  instance_count       = 1 # default is 1
  hf_model_id          = "distilbert-base-uncased-finetuned-sst-2-english"
  hf_task              = "text-classification"
}

advanced example with autoscaling

module "sagemaker-huggingface" {
  source               = "philschmid/sagemaker-huggingface/aws"
  version              = "0.5.0"
  name_prefix          = "distilbert"
  pytorch_version      = "1.9.1"
  transformers_version = "4.12.3"
  instance_type        = "ml.g4dn.xlarge"
  hf_model_id          = "distilbert-base-uncased-finetuned-sst-2-english"
  hf_task              = "text-classification"
  autoscaling = {
    max_capacity               = 4   # The max capacity of the scalable target
    scaling_target_invocations = 200 # The scaling target invocations (requests/minute)
  }
}

examples:

Requirements

Name Version
terraform >= 1.0.0
aws ~> 4.0

Providers

Name Version
aws 3.74.0
random n/a

Modules

No modules.

Resources

Name Type
aws_appautoscaling_policy.sagemaker_policy resource
aws_appautoscaling_target.sagemaker_target resource
aws_iam_role.new_role resource
aws_sagemaker_endpoint.huggingface resource
aws_sagemaker_endpoint_configuration.huggingface resource
aws_sagemaker_endpoint_configuration.huggingface_async resource
aws_sagemaker_endpoint_configuration.huggingface_serverless resource
aws_sagemaker_model.model_with_hub_model resource
aws_sagemaker_model.model_with_model_artifact resource
random_string.resource_id resource
aws_iam_role.get_role data source
aws_sagemaker_prebuilt_ecr_image.deploy_image data source

Inputs

Name Description Type Default Required
async_config (Optional) Specifies configuration for how an endpoint performs asynchronous inference. Required key is s3_output_path, which is the s3 bucket used for async inference.
object({
s3_output_path = string,
s3_failure_path = optional(string),
kms_key_id = optional(string),
sns_error_topic = optional(string),
sns_success_topic = optional(string),
})
{
"kms_key_id": null,
"s3_output_path": null,
"s3_failure_path": null,
"sns_error_topic": null,
"sns_success_topic": null
}
no
autoscaling A Object which defines the autoscaling target and policy for our SageMaker Endpoint. Required keys are max_capacity and scaling_target_invocations
object({
min_capacity = optional(number),
max_capacity = number,
scaling_target_invocations = optional(number),
scale_in_cooldown = optional(number),
scale_out_cooldown = optional(number),
})
{
"max_capacity": null,
"min_capacity": 1,
"scale_in_cooldown": 300,
"scale_out_cooldown": 66,
"scaling_target_invocations": null
}
no
hf_api_token The HF_API_TOKEN environment variable defines the your Hugging Face authorization token. The HF_API_TOKEN is used as a HTTP bearer authorization for remote files, like private models. You can find your token at your settings page. string null no
hf_model_id The HF_MODEL_ID environment variable defines the model id, which will be automatically loaded from hf.co/models when creating or SageMaker Endpoint. string null no
hf_model_revision The HF_MODEL_REVISION is an extension to HF_MODEL_ID and allows you to define/pin a revision of the model to make sure you always load the same model on your SageMaker Endpoint. string null no
hf_task The HF_TASK environment variable defines the task for the used 🤗 Transformers pipeline. A full list of tasks can be find here. string n/a yes
image_tag The image tag you want to use for the container you want to use. Defaults to None. The module tries to derive the image_tag from the pytorch_version, tensorflow_version & instance_type. If you want to override this, you can provide the image_tag as a variable. string null no
instance_count The initial number of instances to run in the Endpoint created from this Model. Defaults to 1. number 1 no
instance_type The EC2 instance type to deploy this Model to. For example, ml.p2.xlarge. string null no
model_data The S3 location of a SageMaker model data .tar.gz file (default: None). Not needed when using hf_model_id. string null no
name_prefix A prefix used for naming resources. string n/a yes
pytorch_version PyTorch version you want to use for executing your inference code. Defaults to None. Required unless tensorflow_version is provided. List of supported versions string null no
sagemaker_execution_role An AWS IAM role Name to access training data and model artifacts. After the endpoint is created, the inference code might use the IAM role if it needs to access some AWS resources. If not specified, the role will created with with the CreateModel permissions from the documentation string null no
serverless_config (Optional) Specifies configuration for how an endpoint performs serverless inference. Required keys are max_concurrency and memory_size_in_mb
object({
max_concurrency = number,
memory_size_in_mb = number
})
{
"max_concurrency": null,
"memory_size_in_mb": null
}
no
tags A map of tags (key-value pairs) passed to resources. map(string) {} no
tensorflow_version TensorFlow version you want to use for executing your inference code. Defaults to None. Required unless pytorch_version is provided. List of supported versions string null no
transformers_version Transformers version you want to use for executing your model training code. Defaults to None. List of supported versions string n/a yes

Outputs

Name Description
iam_role IAM role used in the endpoint
sagemaker_endpoint created Amazon SageMaker endpoint resource
sagemaker_endpoint_configuration created Amazon SageMaker endpoint configuration resource
sagemaker_endpoint_name Name of the created Amazon SageMaker endpoint, used for invoking the endpoint, with sdks
sagemaker_model created Amazon SageMaker model resource
tags n/a
used_container Used container for creating the endpoint

License

MIT License. See LICENSE for full details.