/amazon-sagemaker-experiments-dvc-demo

SageMaker Experiments and DVC

Primary LanguageJupyter NotebookMIT No AttributionMIT-0

SageMaker Experiments and DVC

This sample shows how to use DVC within the SageMaker environment. In particular, we will look at how to build a custom image with DVC libraries installed by default to provide a consistent environment to your data scientists. Furthermore, we show how you can integrate SageMaker Processing, SageMaker Trainings and SageMaker Experiments with a DVC workflow.

For full details on how this works:

Prerequisite

  • An AWS Account
  • An IAM user with Admin-like permissions

If you do not have Admin-like permissions, we recommend to have at least the following permissions:

  • Administer Amazon ECR
  • Administer a SageMaker Studio Domain
  • Administer S3 (or at least any buckets with sagemaker in the bucket name)
  • Create IAM Roles
  • Create a Cloud9 environment

Setup

We suggest for the initial setup, to use Cloud9 on a t3.large instance type.

Build a custom SageMaker Studio image with DVC already installed

We aim to explains how to create a custom image for Amazon SageMaker Studio that has DVC already installed. The advantage of creating an image and make it available to all SageMaker Studio users is that it creates a consistent environment for the SageMake Studio users, which they could also run locally.

This tutorial is heavily inspired by this example. Further information about custom images for SageMaker Studio can be found here

Overview

This custom image sample demonstrates how to create a custom Conda environment in a Docker image and use it as a custom kernel in SageMaker Studio.

The Conda environment must have the appropriate kernel package installed, for e.g., ipykernel for a Python kernel. This example creates a Conda environment called dvc with a few Python packages (see environment.yml) and the ipykernel. SageMaker Studio will automatically recognize this Conda environment as a kernel named conda-env-dvc-py.

Clone the GitHub repository

git clone https://github.com/aws-samples/amazon-sagemaker-experiments-dvc-demo

Resize Cloud9

cd ~/environment/amazon-sagemaker-experiments-dvc-demo/sagemaker-studio-dvc-image/
./resize-cloud9.sh 20

Build the Docker images for SageMaker Studio

Set some basic environment variables

sudo yum install jq -y
export REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')
echo "export REGION=${REGION}" | tee -a ~/.bash_profile

export ACCOUNT_ID=$(aws sts get-caller-identity | jq -r '.Account')
echo "export ACCOUNT_ID=${ACCOUNT_ID}" | tee -a ~/.bash_profile

export IMAGE_NAME=conda-env-dvc-kernel
echo "export IMAGE_NAME=${IMAGE_NAME}" | tee -a ~/.bash_profile

Build the Docker image and push to Amazon ECR.

# Login to ECR
aws --region ${REGION} ecr get-login-password | docker login --username AWS --password-stdin ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom

# Create the ECR repository
aws --region ${REGION} ecr create-repository --repository-name smstudio-custom

# Build the image - it might take a few minutes to complete this step
docker build . -t ${IMAGE_NAME} -t ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}
# Push the image to ECR
docker push ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/smstudio-custom:${IMAGE_NAME}

Associate a custom image to SageMaker Studio

Prepare the environment to deploy with CDK

Step 1: Navigate to the cdk directory:

cd ~/environment/amazon-sagemaker-experiments-dvc-demo/sagemaker-studio-dvc-image/cdk

Step 2: Create a virtual environment:

python3 -m venv .cdk-venv

Step 3: Activate the virtual environment after the init process completes, and the virtual environment is created:

source .cdk-venv/bin/activate

Step 4: Install the required dependencies:

pip3 install --upgrade pip
pip3 install -r requirements.txt

Step 5: Install and bootstrap CDK v2 (The latest CDK version tested was 2.27.0)

npm install -g aws-cdk@2.27.0 --force
cdk bootstrap

Create a new SageMaker Studio

( Skip to Update an existing SageMaker Studio if you have already an existing SageMaker Studio domain)

Step 5: deploy CDK (CDK will deploy a stack named: sagemakerStudioCDK which you can verify in CloudFormation)

cdk deploy --require-approval never

CDK will create the following resources via CloudFormation:

  • provisions a new SageMaker Studio domain
  • creates and attaches a SageMaker execution role, i.e., RoleForSagemakerStudioUsers with the right permissions to the SageMaker Studio domain
  • creates a SageMaker Image and a SageMaker Image Version from the docker image conda-env-dvc-kernel we have created earlier
  • creates an AppImageConfig which specify how the kernel gateway should be configured
  • provision a SageMaker Studio user, i.e., data-scientist-dvc, with the correct SageMaker execution role and makes available the custom SageMaker Studio image available to it

Update an existing SageMaker Studio

If you have an existing SageMaker Studio environment, we need to first retrieve the exising SageMaker Studio domain ID, deploy a "reduced" version of the CDK stack, and update the SageMaker Studio domain configuration.

Step 5: Set the DOMAIN_ID environment variable with your domain ID and save to your bash_profile.

export DOMAIN_ID=$(aws sagemaker list-domains | jq -r '.Domains[0].DomainId')
echo "export DOMAIN_ID=${DOMAIN_ID}" | tee -a ~/.bash_profile

Step 6: deploy CDK (by setting the DOMAIN_ID environment variable, CDK will deploy a stack named: sagemakerUserCDK which you can verify on CloudFormation)

cdk deploy --require-approval never

CDK will create the following resources via CloudFormation:

  • creates and attaches a SageMaker execution role, i.e., RoleForSagemakerStudioUsers with the right permissions to your existing SageMaker Studio domain
  • creates a SageMaker Image and a SageMaker Image Version from the docker image conda-env-dvc-kernel we have created earlier
  • creates an AppImageConfig which specify how the kernel gateway should be configured
  • provision a SageMaker Studio user, i.e., data-scientist-dvc, with the correct SageMaker execution role and makes available the custom SageMaker Studio image available to it

Step 7: Update the SageMaker Studio domain configuration

# inject your DOMAIN_ID into the configuration file
sed -i 's/<your-sagemaker-studio-domain-id>/'"$DOMAIN_ID"'/' ../update-domain-input.json

# update the sagemaker studio domain
aws --region ${REGION} sagemaker update-domain --cli-input-json file://../update-domain-input.json

Open the newly created SageMaker Studio user, i.e., data-scientist-dvc.

image info

Execute the sample notebook

In the SageMaker Studio domain, launch Studio for the data-scientist-dvc. Open a terminal and clone the repository

git clone https://github.com/aws-samples/amazon-sagemaker-experiments-dvc-demo

and open the dvc_sagemaker_script_mode.ipynb notebook.

When prompted, ensure that you select the Custom Image conda-env-dvc-kernel as shown below

We provide two sample notebooks to see how to use DVC in combination with SageMaker:

  • one that installs DVC in script mode by passing a requirements.txt file to both the processing job and the training job dvc_sagemaker_script_mode.ipynb;
  • one that shows how to create the container for the processing jobs, the training jobs, and the inference dvc_sagemaker_byoc.ipynb.

Both notebooks are meant to be used within SageMaker Studio with the custom image created before.

Cleanup

Before removing all resources created, you need to make sure that all Apps are deleted from the data-scientist-dvc user, i.e. all KernelGateway apps, as well as the default JupiterServer.

Once done, you can destroy the CDK stack by running

cd ~/environment/amazon-sagemaker-experiments-dvc-demo/sagemaker-studio-dvc-image/cdk
cdk destroy

In case you started off from an existing domain, please also execute the following command:

# inject your DOMAIN_ID into the configuration file
sed -i 's/<your-sagemaker-studio-domain-id>/'"$DOMAIN_ID"'/' ../update-domain-no-custom-images.json

# update the sagemaker studio domain
aws --region ${REGION} sagemaker update-domain --cli-input-json file://../update-domain-no-custom-images.json

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.