This repo contains an example of how to deploy SageMaker Studio using CDK. The project deploys code through a CICD pipeline using CodeCommit, CodeBuild and CodePipelines and gives the abiloity to deploy to prod and to a sandbox environment for testing. Features of the deployment
- Deploys SageMaker Studio with IAM or SSO support
- Deploys a lifecycle policy that will terminate compute after 60 minutes of being idle
- Enabled AWS Glue support through Role permissions
Test examples have been provided under the tests folder and will be executed by the deployment pipelines. THe project also runs tests for black, bandit, radon, xenon and coverage.
Isolated Deployment (No Internet)
It's possible to deploy the solution and configure SageMaker Studio with no internet access. for this deployment type set the "USE_S3_FOR_ASSETS" variable to True and the "SUBNET_DEPLOYMENT_TYPE" to private or private-isolated, depending on your subnet type. This will use an s3 deployed version of auto shutdown script. Some points to consider with this deployment method
- To access other AWS services, ensure you have the correct VPC Endpoints in place. A listing can be found here for SageMaker required endpoints. https://docs.aws.amazon.com/sagemaker/latest/dg/studio-notebooks-and-internet-access.html#studio-notebooks-and-internet-access-vpc
- To access Pip you can use CodeArtifact to setup a pass through repository.
Production Dependencies
- Python 3.7 or above
- CDK 2.6
Development Dependencies
- bandit - will check for common security issues in Python (https://bandit.readthedocs.io/en/latest/)
- black - Code must conform to the black standard. A test will run to ensure it does and if not will fail the deployment. You can install black globally using pipx and then run
black .
on the repository before you commit. (https://github.com/psf/black) - radon - Provides code metrics (https://radon.readthedocs.io/en/latest/)
- xenon - Monitors code complexity (https://xenon.readthedocs.io/en/latest/)
- coverage - Provides code coverage for unit tests (https://github.com/nedbat/coveragepy)
When deploying the solution it will create a code commit repository in your AWS account and start a pipeline deployment. Once you have deployed you can switch from the GitHub repository to your CodeCommit repository.
- Clone the repo
git clone {GITREPO HERE ONCE CREATED}
- Update constants.py with the details of your environment. You must update the items marked as "must update", any other elements you can leave as default. (See table at the end)
- Create a virtual environment for Python and install dependencies
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
to run tests you will also need to run
pip install -r requirements-dev.txt
- Run
cdk deploy sagemaker-studio-deployment-toolchain
to deploy the CICD components and create the CodeCommit repository - hit yes to deploy. This operation only needs to be performed when you want to deploy the CICD pipelines, or if you want to update them. The deployment of SageMaker studio will be deployed by the CICD pipeline.
- Record the output of the repote codecommit endpoint.
- You now need to commit the code to your new repository
- Disconnect from the GitHub repo and reconnect to CodeCommit
git remote remove origin
git init --initial-branch=main
git remote add origin codecommit::ap-southeast-2://{YOUR_REPO_NAME_HERE}
To confirm the orgin has updated run git remote get-url --all origin
- Before the pipeline will run successfully you need to run a cdk synth to generate the cdk.context.json file and check that into source control.
cdk synth
- confirm you now have a file in the repository called 'cdk.context.json'. This file contains the details of your VPC's for deployment
- Now you can commit your code to the repository.
git add .
git commit -m "initial commit"
git push --set-upstream origin main
- go to the aws console, navigate to
The project uses the following guidelines to structure the repository https://aws.amazon.com/blogs/developer/recommended-aws-cdk-project-structure-for-python-applications/
The project uses the excellent auto-shutdown script from https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension
- Clone the repo
- run the initialise command which will deploy the repo into your codeocmmmit account
cdk deploy sagemaker-studio-deployment-toolchain
git init --initial-branch=main
git remote add origin codecommit::ap-southeast-2://sagemaker-studio-jbash-config-repo
git add .
git commit
git push --set-upstream origin main
Name | Must Update | Description | Default |
---|---|---|---|
APP_NAME | Name of your application | sagemaker-studio | |
SAGEMAKER_DOMAIN_NAME_PREFIX | Prefix for the SageMaker domain to be created | sms | |
SANDBOX_ENV_NAME | The name of the sandbox environment (used to prefix some elements) | sandbox | |
MAIN_ENV_NAME | The name of the production environment (used to prefix some elements) | dev | |
MAIN_ENV_ACCOUNT | Yes | Your production AWS Account id. | |
MAIN_ENV_REGION | Yes | The AWS Region you would like to deploy the production SageMaker Studio to | |
VPC_NAME | Yes | The VPC name that you would like to deploy into. The name is used to lookup the VPC in CDK | |
AUTH_TYPE | Authentication type SSO or IAM | SSO | |
ADD_GLUE_PERMISSION | If you want to enable Glue permission in SageMaker so that users can use Glue Interactive Sessions | True | |
JUPYTERLAB_DEFAULT | Which version of JupyterLab would you like to use. Jupyter Lab 3 is the default | JL3 | |
TOOLCHAIN_ACCOUNT | Yes | The AWS account you would like to deploy the CICD components | |
TOOLCHAIN_REGION | Yes | The AWS region you would like to deploy the CICD components | |
CODECOMMIT_REPO | The name of the code commit repo | APP_NAME + "-" + "config-repo" | |
CODECOMMIT_TRUNK_BRANCH | The trunk branch for you CodeCommit repo | main |