/sagely

ML pipeline tutorials for OSM labels to AWS Sagemaker

Primary LanguageJupyter Notebook

Sagely

Purpose: Use OSM vector data to train a convolutional neural network (CNN) in AWS Sagemaker for building, road, (etc) object detection.

Inputs: Location of HOT-OSM task OR city/state/country of interest & a web-url to DG Cloud Optimized Geotif (COG).

Outputs: TMS (slippy map) training data using the OSM vectors + AWS Sagemaker model endpoint.

Test

This repo is still a work in progress! Not all the test/train script are 100% functional.

There are TWO parts to this workflow. The first is best illustrated by checking out the ipynb tutorial that will walk you through the OSM vector data to ML training data. Once the traing data is generated, you can use the following scripts to create a virtual environment for AWS Sagemaker training.

Setup Your Machine

  1. setup a virtual environnment:
SStrong-CRYL17$ virtualenv -p python3 sagemaker_trans
SStrong-CRYL17$ source sagemaker_trans/bin/activate
SStrong-CRYL17$ cd sagemaker_trans/
  1. Clone this repo onto your local machine.
SStrong-CRYL17$ git clone https://github.com/shaystrong/sagely.git
SStrong-CRYL17$ cd sagely/
  1. Run the setup. It will install necessary libraries
SStrong-CRYL17$ sh setup.sh

Download Script

SStrong-CRYL17$ sh get_data.sh

Test Script

SStrong-CRYL17$ sh test.sh

Results should look like:

Clean Up

deactivate
rm -rf /path/to/venv/sagemaker_trans/

Train

Watch!

Watch you model training on Sagemaker! You can login to the AWS console and see the progression of the learning as well as all your parameters.

Metrics

None Yet!

Notes

Your OSM vector data may be messy, and or may not align with the imagery. It is up to you to manually inspect, modify, cull the training data generated for optimal model performance. There is no step presented here to do this for you. In fact, it is a critical step as a Data Scientist that you own that element.