HousePricePredAPI

INTRO

Develop a web service (ML api) can predict the price of house based on the Ames housing dataset. The web serice expose one end point that can take numerical input (79 variables json) and return prediction as output
Try the service!

# the service may not be accessible at the moment, since I closed down the AWS instances, stay tuned for the update!
$ curl 35.165.231.104:8000
$ curl http://35.165.231.104:8000/REST/api/v1.0/train
$ curl http://35.165.231.104:8000/REST/api/v1.0/model_list
$ curl -i -H "Content-Type: application/json" -X POST -d $(python script/get_test_json.py) http://35.165.231.104:8000/REST/api/v1.0/predict_with_input

Architecture

Flask as ML api server
Docker hub as service Docker repository
AWS ECS as container service run ML api via Docker
AWS Elastic Load Balancer automatically distributes incoming application traffic across multiple targets
AWS S3 as space storage models, ML output, and logs
Architecture idea : develop a dockerized ML API via flask and deploy the same API hundreds even millions times on the cloud. The usage of AWS Elastic Load Balancer(ELB) is for dealing with above scalability, the ELB will dispense heavy API requests to workers running on the ECS for returning the ML predicitons on time. Usage of AWS S3 as space saving models/outputs with versions. Can send the log to the AWS cloudwatch for the service dashboard. Can run the API on the AWS Fargate for its serverless advantage (quick develop, no ec2 managment costs) as well.

Process

Local dev -> Local train -> Unit-test -> Docker build -> Travis (CI/CD) -> Deploy to Dockerhub  -> Deploy to AWS ECS -> Online train -> API ready

File structure

├── Dockerfile        : Dockerfile build web service (ML api)
├── Predict           : Main class for ML prediction
├── api               : API runner (flask web server)
├── data              : Train, and test data
├── log               : Service log file
├── model             : File storage trained models 
├── output            : File storage ML prediction output
├── requirements.txt  : Python dependency
├── script            : Helper scripts (parse json, upload files..)
└── tests             : Unit-test scripts
└── utils             : utils class for file/S3 IO..

Tech

Python3.6, Flask, scikit-learn, Pandas, Numpy, pytest
AWS ECS, ECR, Load Balancer
Docker
Travis
Git

Quick start

Quick start Docker

# Docker
$ docker build . -t house_pred_env
$ docker run -p 8000:8000 -it house_pred_env 
$ curl http://localhost:8000/REST/api/v1.0/train
$ curl -i -H "Content-Type: application/json" -X POST -d $(python script/get_test_json.py) http://localhost:8000/REST/api/v1.0/predict_with_input

Quick start maunally

# Maunally method I 
$ python api/app.py
$ python script/init_model.py
$ curl -i -H "Content-Type: application/json" -X POST -d $(python script/get_test_json.py) http://localhost:8000/REST/api/v1.0/predict_with_input

# Maunally method II
$ python api/app.py
$ python script/init_model.py
$ curl -i -H "Content-Type: application/json" -X POST -d '{"MSSubClass":20.0,"LotFrontage":100.0,"LotArea":17500.0,"OverallQual":7.0,"OverallCond":8.0,"YearBuilt":1959.0,"YearRemodAdd":2002.0,"MasVnrArea":0.0,"BsmtFinSF1":1406.0,"BsmtFinSF2":0.0,"BsmtUnfSF":496.0,"TotalBsmtSF":1902.0,"1stFlrSF":1902.0,"2ndFlrSF":0.0,"LowQualFinSF":0.0,"GrLivArea":1902.0,"BsmtFullBath":1.0,"BsmtHalfBath":0.0,"FullBath":2.0,"HalfBath":0.0,"BedroomAbvGr":3.0,"KitchenAbvGr":1.0,"TotRmsAbvGrd":7.0,"Fireplaces":2.0,"GarageYrBlt":1959.0,"GarageCars":2.0,"GarageArea":567.0,"WoodDeckSF":0.0,"OpenPorchSF":207.0,"EnclosedPorch":162.0,"3SsnPorch":0.0,"ScreenPorch":0.0,"PoolArea":0.0,"MiscVal":0.0,"MoSold":5.0,"YrSold":2010.0}' http://localhost:8000/REST/api/v1.0/predict_with_input

Useage of the API

API Helloworld

Endpoint: /

$ curl http://localhost:8000/
# API Hello World!

Check API status

Endpoint: /REST/api/v1.0/health

$ curl http://localhost:8000/REST/api/v1.0/health
# {
#   "api_status": "OK", 
#   "http_status": 200
# }

API document

Endpoint: /REST/api/v1.0/doc

$ curl http://localhost:8000/REST/api/v1.0/doc
#

Train a model

Endpoint: /REST/api/v1.0/train

$ curl http://localhost:8000/REST/api/v1.0/train

Predict with test data

Endpoint: /REST/api/v1.0/predict

$ curl http://localhost:8000/REST/api/v1.0/predict

Predict with input json

Endpoint: /REST/api/v1.0/predict_with_input

$ curl -i -H "Content-Type: application/json" -X POST -d '{"MSSubClass":20.0,"LotFrontage":100.0,"LotArea":17500.0,"OverallQual":7.0,"OverallCond":8.0,"YearBuilt":1959.0,"YearRemodAdd":2002.0,"MasVnrArea":0.0,"BsmtFinSF1":1406.0,"BsmtFinSF2":0.0,"BsmtUnfSF":496.0,"TotalBsmtSF":1902.0,"1stFlrSF":1902.0,"2ndFlrSF":0.0,"LowQualFinSF":0.0,"GrLivArea":1902.0,"BsmtFullBath":1.0,"BsmtHalfBath":0.0,"FullBath":2.0,"HalfBath":0.0,"BedroomAbvGr":3.0,"KitchenAbvGr":1.0,"TotRmsAbvGrd":7.0,"Fireplaces":2.0,"GarageYrBlt":1959.0,"GarageCars":2.0,"GarageArea":567.0,"WoodDeckSF":0.0,"OpenPorchSF":207.0,"EnclosedPorch":162.0,"3SsnPorch":0.0,"ScreenPorch":0.0,"PoolArea":0.0,"MiscVal":0.0,"MoSold":5.0,"YrSold":2010.0}' http://localhost:8000/REST/api/v1.0/predict_with_input

List trained models

Endpoint: /REST/api/v1.0/model_list

$ curl http://localhost:8000/REST/api/v1.0/model_list

List ML predictions

Endpoint: /REST/api/v1.0/predict_list

$ curl http://localhost:8000/REST/api/v1.0/predict_list

Development

# unit test 
$ pytest -v tests/
# ============================ test session starts =============================
# platform darwin -- Python 3.6.10, pytest-5.3.3, py-1.8.1, pluggy-0.13.1 -- /Users/yennanliu/anaconda3/envs/yen_dev/bin/python
# cachedir: .pytest_cache
# rootdir: /Users/yennanliu/HousePricePredAPI
# collected 18 items                                                           

# tests/test_api.py::test_404_page_not_found PASSED                      [  5%]
# tests/test_api.py::test_api_helloworld PASSED                          [ 11%]
# tests/test_api.py::test_get_model_list PASSED                          [ 16%]
# tests/test_api.py::test_get_predict_list PASSED                        [ 22%]
# tests/test_api.py::test_train_house_price_model PASSED                 [ 27%]
# tests/test_api.py::test_predict_house_price PASSED                     [ 33%]
# tests/test_api.py::test_predict_house_price_with_input PASSED          [ 38%]
# tests/test_predict.py::test_list_model PASSED                          [ 44%]
# tests/test_predict.py::test_list_prediction PASSED                     [ 50%]
# tests/test_predict.py::test_save_model PASSED                          [ 55%]
# tests/test_predict.py::test_load_model PASSED                          [ 61%]
# tests/test_predict.py::test_process_data PASSED                        [ 66%]
# tests/test_predict.py::test_process_input_data PASSED                  [ 72%]
# tests/test_predict.py::test_prepare_train_data PASSED                  [ 77%]
# tests/test_predict.py::test_train PASSED                               [ 83%]
# tests/test_predict.py::test_predict PASSED                             [ 88%]
# tests/test_predict.py::test_predict_with_input PASSED                  [ 94%]
# tests/test_predict.py::test_predict_with_nonvalidated_input PASSED     [100%]

# ============================== warnings summary ==============================

Deployment

Use Travis as CI/CD tool.
steps of CI/CD:
1. Run unit-test
2. Build dockerfile
3. Deploy to DockerHub/AWS ECR
4. Deploy to AWS ECS
5. Update AWS ECS task, services
6. API updated

RESTful

RESTful status code

code	comment	example	ref
`1xx`	`1xx` -> `msg`. Client request already been accepted or is processing by server
`2xx`	`2xx` -> `SUCCESS`. Client request already accepted and completed by server
`3xx`	`3xx` -> `RE-DIRECT`. Though client request already been accepted by server. But there some further operations needed
`4xx`	`4xx` -> `ERROR`. Some syntax errors in client request, or the request can't be processed for some reasons
`5xx`	`5xx` -> `SERVER-ERROR`. There are errors on server side when process the validated request from client

TODO

Fix return msg (api) when invalid input
Fix error handling
Fix data process logic
Fix model train, test evaluation logic
Fix duplicated class instantiation
Fix high level : inconsistency when update some model out of all models
Fast model IO
Automate whole process : dev -> test -> deploy to AWS
Offline training
Online training (when new input data, save the re-train model as new version)
Train (via API) with super-parameter / parameter
Output model as standard format
Track log

Ref

RESTful API design - stackoverflow blog : best-practices-for-rest-api-design
which API? : RPC vs REST vs GraphQL,