SageMaker Studio deployment using CDK

This repo contains an example of how to deploy SageMaker Studio using CDK. The project deploys code through a CICD pipeline using CodeCommit, CodeBuild and CodePipelines and gives the abiloity to deploy to prod and to a sandbox environment for testing. Features of the deployment

  • Deploys SageMaker Studio with IAM or SSO support
  • Deploys a lifecycle policy that will terminate compute after 60 minutes of being idle
  • Enabled AWS Glue support through Role permissions

Test examples have been provided under the tests folder and will be executed by the deployment pipelines. THe project also runs tests for black, bandit, radon, xenon and coverage.

Isolated Deployment (No Internet)

It's possible to deploy the solution and configure SageMaker Studio with no internet access. for this deployment type set the "USE_S3_FOR_ASSETS" variable to True and the "SUBNET_DEPLOYMENT_TYPE" to private or private-isolated, depending on your subnet type. This will use an s3 deployed version of auto shutdown script. Some points to consider with this deployment method

Dependencies

Production Dependencies

  • Python 3.7 or above
  • CDK 2.6

Development Dependencies

Setup and Install

When deploying the solution it will create a code commit repository in your AWS account and start a pipeline deployment. Once you have deployed you can switch from the GitHub repository to your CodeCommit repository.

  1. Clone the repo git clone {GITREPO HERE ONCE CREATED}
  2. Update constants.py with the details of your environment. You must update the items marked as "must update", any other elements you can leave as default. (See table at the end)
  3. Create a virtual environment for Python and install dependencies
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

to run tests you will also need to run

pip install -r requirements-dev.txt
  1. Run cdk deploy sagemaker-studio-deployment-toolchain to deploy the CICD components and create the CodeCommit repository
  2. hit yes to deploy. This operation only needs to be performed when you want to deploy the CICD pipelines, or if you want to update them. The deployment of SageMaker studio will be deployed by the CICD pipeline.
  3. Record the output of the repote codecommit endpoint.
  4. You now need to commit the code to your new repository
  5. Disconnect from the GitHub repo and reconnect to CodeCommit
git remote remove origin
git init --initial-branch=main
git remote add origin codecommit::ap-southeast-2://{YOUR_REPO_NAME_HERE}

To confirm the orgin has updated run git remote get-url --all origin

  1. Before the pipeline will run successfully you need to run a cdk synth to generate the cdk.context.json file and check that into source control.
cdk synth
  1. confirm you now have a file in the repository called 'cdk.context.json'. This file contains the details of your VPC's for deployment
  2. Now you can commit your code to the repository.
git add .
git commit -m "initial commit"
git push --set-upstream origin main
  1. go to the aws console, navigate to

Reference

The project uses the following guidelines to structure the repository https://aws.amazon.com/blogs/developer/recommended-aws-cdk-project-structure-for-python-applications/

The project uses the excellent auto-shutdown script from https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension

Deployment

  • Clone the repo
  • run the initialise command which will deploy the repo into your codeocmmmit account cdk deploy sagemaker-studio-deployment-toolchain

git init --initial-branch=main

  • git remote add origin codecommit::ap-southeast-2://sagemaker-studio-jbash-config-repo
  • git add .
  • git commit
  • git push --set-upstream origin main

Variables

Name Must Update Description Default
APP_NAME Name of your application sagemaker-studio
SAGEMAKER_DOMAIN_NAME_PREFIX Prefix for the SageMaker domain to be created sms
SANDBOX_ENV_NAME The name of the sandbox environment (used to prefix some elements) sandbox
MAIN_ENV_NAME The name of the production environment (used to prefix some elements) dev
MAIN_ENV_ACCOUNT Yes Your production AWS Account id.
MAIN_ENV_REGION Yes The AWS Region you would like to deploy the production SageMaker Studio to
VPC_NAME Yes The VPC name that you would like to deploy into. The name is used to lookup the VPC in CDK
AUTH_TYPE Authentication type SSO or IAM SSO
ADD_GLUE_PERMISSION If you want to enable Glue permission in SageMaker so that users can use Glue Interactive Sessions True
JUPYTERLAB_DEFAULT Which version of JupyterLab would you like to use. Jupyter Lab 3 is the default JL3
TOOLCHAIN_ACCOUNT Yes The AWS account you would like to deploy the CICD components
TOOLCHAIN_REGION Yes The AWS region you would like to deploy the CICD components
CODECOMMIT_REPO The name of the code commit repo APP_NAME + "-" + "config-repo"
CODECOMMIT_TRUNK_BRANCH The trunk branch for you CodeCommit repo main