This repository provides an example of how to use the Snowflake Data Cloud as a source of training data for training a machine learning (ML) model in Amazon SageMaker. We download the training data from a Snowflake table directly into a Amazon SageMaker training instance rather than into an Amazon S3 bucket.
We use the California Housing Dataset in this example to to train a regression model to predict the median house value for each district. We create a custom container for running the training job, this container uses the SageMaker XGBoost container image as the base image and includes the snowflake-python connector for interfacing with Snowflake.
The following figure represents the high-level architecture of the proposed solution to use Snowflake as a data source to train ML models with Amazon SageMaker.
New: For users that prefer a low-code or out of the box solution, Amazon SageMaker JumpStart now offers XGBoost and SKLearn models with direct data integration to Snowflake. The notebook sagemaker-snowflake-example-jumpstart.ipynb shows how to use JumpStart's XGBoost model to train a regressor model directly on data in Snowflake without needing to copy the data to S3 and without needing to write a custom training script.
Follow the steps listed below prior to running the notebooks included in this repository.
-
Create a free account with Snowflake. Detailed instructions are available in
snowflake-instructions
file. -
Launch the cloud formation template included in this repository using one of the buttons from the table below. The cloud formation template will create an IAM role called
SageMakerSnowFlakeExample
and a SageMaker Notebook calledaws-aiml-blogpost-sagemaker-snowflake-example
that we will use for running the code in this repository.AWS Region Link us-east-1 (N. Virginia) us-east-2 (Ohio) us-west-1 (N. California) eu-west-1 (Dublin) ap-northeast-1 (Tokyo)
Follow step-by-step instructions provided in the blog post.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. See CONTRIBUTING
See the open issues for a full list of proposed features (and known issues).
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.