This project will provide the machine learning component of the Solicitation Review Tool using AWS SageMaker.
For now, the project includes only model training and deployment functionality, using SageMaker within a private subnet.
Eventually, a combination of AWS Lambda, API Gateway and Congito will be added to make these SageMaker actions callable through a REST API using Oauth2.
First, you need an AWS account with a VPC configured as described here.
We use pipenv for dependency management and to ensure that your local environment matches that of AWS SageMaker (particularly the sklearn framework).
While in the root of this repo, install your dependencies with:
pipenv install
Now start a shell with that environment:
pipenv shell
This will spawn a new shell subprocess, which can be deactivated by using exit
.
One of the required packages you just installed is ipykernel
. We use this to create a kernel that uses our virtual enivronment for the Jupyter Notebook:
ipython kernel install --user --name=srt-ml
The SRT utilizes supervised machine learning. You can find 993 pre-labeled documents here. Download them and move them into a new directory named labeled_fbo_docs/
.
The awscli
python package was included as a dependency, but you still need to configure it using aws configure
. See this doc on how to do that.
NOTE: You should have already created an IAM user for this project - as well as the infrastructure - following the linked above.
At this point, you can start jupyter with jupyter notebook
. Open Upload Training Data to S3.ipynb
and select the kernel that you created a moment ago.
From here, follow the steps in the notebook to push the labeled data up to your S3 bucket. Make sure you adust the name of the bucket to reflect your bucket's name.
You're now ready to use SageMaker. When creating the SageMaker notebook instance, you should have linked this repository. If so, srt.ipynb
will already be present once you launch the notebook instance.
NOTE: you can run shell commands, such as
git pull
, within a Jupyter Notebook cell by prepending the command with an exclamation point, e.g.! git pull
. Doing this will help you keep your SageMaker notebook instance current with the remote repo.
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the Creative Commons Zero v1.0 Universal License - see the LICENSE.md file for details.