DS Build Week scaffold

Big picture

Here's a template with starter code to deploy an API for your machine learning model and data visualizations. You're encouraged (but not required) to use this template for your Build Week.

You can deploy on Heroku in 10 minutes. Here's the template deployed as-is: https://ds-bw-test.herokuapp.com/

Instead of Flask, we'll use FastAPI. It's similar, but faster, with automatic interactive docs. For more comparison, see FastAPI for Flask Users.

Tech stack

  • FastAPI: Web framework. Like Flask, but faster, with automatic interactive docs.
  • Flake8: Linter, enforces PEP8 style guide.
  • Heroku: Platform as a service, hosts your API.
  • Pipenv: Reproducible virtual environment, manages dependencies.
  • Plotly: Visualization library, for Python & JavaScript.
  • Pytest: Testing framework, runs your unit tests.

Getting started

Create a new repository from this template.

Clone the repo

git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git

cd YOUR-REPO-NAME

Install dependencies

pipenv install --dev

Activate the virtual environment

pipenv shell

Launch the app

uvicorn app.main:app --reload

Go to localhost:8000 in your browser.

image

You'll see your API documentation:

  • Your app's title, "DS API"
  • Your description, "Lorem ipsum"
  • An endpoint for POST requests, /predict
  • An endpoint for GET requests, /viz/{statecode}

Click the /predict endpoint's green button.

image

You'll see the endpoint's documentation, including:

  • Your function's docstring, """Make random baseline predictions for classification problem."""
  • Request body example, as JSON (like a Python dictionary)
  • A button, "Try it out"

Click the "Try it out" button.

image

The request body becomes editable.

Click the "Execute" button. Then scroll down.

image

You'll see the server response, including:

  • Code 200, which means the request was successful.
  • The response body, as JSON, with random baseline predictions for a classification problem.

Your job is to replace these random predictions with real predictions from your model. Use this starter code and documentation to deploy your model as an API!

File structure

.
└── app
    ├── __init__.py
    ├── db.py
    ├── main.py
    ├── ml.py
    ├── viz.py
    └── tests
        ├── __init__.py
        ├── test_main.py
        └── test_predict.py

app/main.py is where you edit your app's title and description, which are displayed at the top of the your automatically generated documentation. This file also configures "Cross-Origin Resource Sharing", which you shouldn't need to edit.

app/ml.py defines the Machine Learning endpoint. /predict accepts POST requests and responds with random predictions. In a notebook, train your model and pickle it. Then in this source code file, unpickle your model and edit the predict function to return real predictions.

When your API receives a POST request, FastAPI automatically parses and validates the request body JSON, using the Item class attributes and functions. Edit this class so it's consistent with the column names and types from your training dataframe.

app/tests/test_*.py is where you edit your pytest unit tests.

More instructions

Activate the virtual environment

pipenv shell

Install additional packages

pipenv install PYPI-PACKAGE-NAME

Launch a Jupyter notebook

jupyter notebook

Run tests

pytest

Run linter

flake8

calmcode.io videos - flake8

Deploying to Heroku

Prepare Heroku

heroku login

heroku create YOUR-APP-NAME-GOES-HERE

heroku git:remote -a YOUR-APP-NAME-GOES-HERE

Deploy to Heroku

git add --all

git add --force Pipfile.lock

git commit -m "Deploy to Heroku"

git push heroku main:master

heroku open

(If you get a Locking failed! error when deploying to Heroku or running pipenv install then delete Pipfile.lock and try again, without git add --force Pipfile.lock)

Deactivate the virtual environment

exit

Example: Machine learning

Follow the getting started instructions.

Edit app/main.py to add your API title and description.

app = FastAPI(
    title='House Price DS API',
    description='Predict house prices in California',
    docs_url='/',
)

Edit app/ml.py to add a docstring for your predict function and return a naive baseline.

@router.post('/predict')
async def predict(item: Item):
    """Predict house prices in California."""
    y_pred = 200000
    return {'predicted_price': y_pred}

In a notebook, explore your data. Make an educated guess of what features you'll use.

import pandas as pd
from sklearn.datasets import fetch_california_housing

# Load data
california = fetch_california_housing()
print(california.DESCR)
X = pd.DataFrame(california.data, columns=california.feature_names)
y = california.target

# Rename columns
X.columns = X.columns.str.lower()
X = X.rename(columns={'avebedrms': 'bedrooms', 'averooms': 'total_rooms', 'houseage': 'house_age'})

# Explore descriptive stats
X.describe()
# Use these 3 features
features = ['bedrooms', 'total_rooms', 'house_age']

Edit the class in app/ml.py to use your features.

class House(BaseModel):
    """Use this data model to parse the request body JSON."""
    bedrooms: int
    total_rooms: float
    house_age: float

    def to_df(self):
        """Convert pydantic object to pandas dataframe with 1 row."""
        return pd.DataFrame([dict(self)])

@router.post('/predict')
async def predict(house: House):
    """Predict house prices in California."""
    X_new = house.to_df()
    y_pred = 200000
    return {'predicted_price': y_pred}

Test locally, then deploy to Heroku with your work-in-progress. Get to this point by the middle of Build Week. (By Wednesday lunch for full-time cohorts. By end of week one for part-time cohorts.) Now your web teammates can make POST requests to your API endpoint.

In a notebook, train your pipeline and pickle it. See these docs:

Get version numbers for every package you used in your pipeline. Install the exact versions of these packages in your virtual environment.

Edit app/ml.py to unpickle your model and use it in your predict function.

Now you are ready to re-deploy! 🚀