revised repo for Manning book Deep Learning with Structured Data https://www.manning.com/books/deep-learning-with-structured-data
This repo is a rework of the original repo for the book at https://github.com/ryanmark1867/manning (which is being kept in place for the convenience of MEAP users who have already started to use it). Improvements in this new repo include:
- rationalized file names
- simplified directory structure
- notebooks tested on Python 3.6 and Python 3.7 in free and for-fee environments
- config files used to remove hard-coded parameters
- code largely refactored to make it easier to follow and simpler to run
- incorporate TensorFlow 2.0 for model training and deployment
- data - processed datasets and pickle files for intermediate datasets
- deploy - code for deploying the trained model using Rasa and Facebook Messenger, as described in chapter 8 of the book
- deploy_web - code for deploying the trained model via a simple web page served by Flask. Description coming in chapter 8.
- models - saved trained models
- notebooks - Jupyter notebooks streetcar_data_preparation.ipynb (data preparation), streetcar_model_training.ipynb (deep learning model training) and streetcar_model_training_xgb.ipynb (XGBoost model training) along with associated class and config files
- pipelines - pickled pipeline files generated by streetcar_model_training.ipynb. Together with the trained models these pipeline files are used in the deployment.
- sql - SQL used to generate a table that can be used for the simple SQL examples in chapter 2
prepare data - steps to clean up input data to prepare it to train a deep learning model. Output is a pickled dataframe containing the cleaned up dataset.
- update notebooks/streetcar_data_preparation_config.yml to specify name of input data (pickled_input_dataframe if load_from_scratch = False; if load_from_scratch = True then copy xls files from https://open.toronto.ca/dataset/ttc-streetcar-delay-data/ to data directory) and output dataframe
- run notebooks/streetcar_data_preparation.ipynb
train deep learning model - steps to train a deep learning model on the cleaned up dataset from the previous step. Output is a trained model (h5 file) and two pickle files for the pipeline
- update notebooks/streetcar_model_training_config.yml to specify input dataframe (pickled_dataframe). Set this to the filename of the dataframe output in the prepare data step
- run notebooks/streetcar_model_training.ipynb in an env. with TensorFlow 2.0 installed
train XGBoost model - steps to train a deep learning model on the cleaned up dataset from the previous step. Output is a trained model (h5 file) and two pickle files for the pipeline
- update notebooks/streetcar_model_training_config.yml to specify input dataframe (pickled_dataframe). Set this to the filename of the dataframe output in the prepare data step
- run streetcar_model_training_xgb.ipynb - note that the saved XGBoost models are saved in the models directory along with the saved deep learning models
web deployment of model - steps to set up a simple web page for exercising the trained model. Uses the trained model and pipeline files from the previous step.
- update deploy_web/deploy_web_config.yml: set pipeline1_filename, pipeline2_filename and model_filename to the names of the pipeline files and trained model file generated in the training step. Alternately, you can run the training notebook using the config file as-is to use prepared pipeline and model files already included in the repo.
- start the Flask server deploy_web/flask_server.py by running: python flask_server.py
- open localhost:5000 in a browser, select the details for your trip, and click on Get prediction
- https://open.toronto.ca/dataset/ttc-streetcar-delay-data/ original dataset
- https://www.kaggle.com/knowledgegrappler/a-simple-nn-solution-with-keras-0-48611-pl Kaggle submission that was used as input to creation of the Keras model used in this example