This Repository contains Notes in form of Jupyter Notebooks for every Chapter of the Book "Building Machine Learning Pipelines" by Hannes Hapke & Catherine Nelson published by O'Reilly.
While going through the book it helped me personally to follow along with the codes and reproduce most of the the things showed in the book. Due to the fact that some things were harder than explained I thought of providing that Repository for other people facing similar difficulties.
Even though I have coded everything by myself most of the raw python codes are out of the book and I just changed a couple of lines to make it working for myself.
The repository contains codes and additional information from nearly all chapters of the book. Especially the chapter 2 - 8 are working seemlessly in a Jupyter Notebook. I personally developed all the Notebooks in Google Colab Environments, but they should run all locally too. For the Chapters 9, 10 and 13 I provided a lot of comprehended information, but with less code and the chapter 11, 12, 14 and all the appendices are not covered in great detail or at all. This is due to the fact that the explanation would needed to be just as detailed as in the book to be valuable. Therefore I did not rephrase theses few chapter on Orchestration.
Even though I coded more or less every line that was mentioned in the book, I will not include almost all of the output, this is due to the fact that some files are pretty large. But anyway the outputs can easily be generated by yourself, by just running the codes.
I personally added some additional resources at the end of each notebook, which I just stumbled accross and found useful, but I took some out of the book too. Further the repository contains not just code, but a lot of text that is helpful for the understanding of the tools used. I did not wanted to quote the book for every sentence I rephrased and I hope that it is fine, since the repository is just an additional explanation to the things mentioned in the book and should lead to a better understanding.
- Chapter 2: Introduction to TensorFlow Extended
- Chapter 3: Data Ingestion
- Chapter 4: Data Validation
- Chapter 5: Data Preprocessing
- Chapter 6: Model Training
- Chapter 7: Model Analysis and Validation
- Chapter 8: Model Deployment with TensorFlow Serving
- Chapter 9: Advanced Model Deployments with TensorFlow Serving
- Chapter 10: Advanced TensorFlow Extended
- Chapter 11: Pipelines Part 1: Apache Beam and Apache Airflow
- Chapter 12: Pipelines Part 2: Kubeflow Pipelines
- Chapter 13: Feedback Loops
- Chapter 14
Clone the repo
$ git clone https://github.com/JanMarcelKezmann/Building-ML-Pipelines-Notes
Install requirements
pip install --upgrade -r requirements.txt
Make sure you have the following installed:
- Python 3.6 or 3.7
- tensorflow >= 2.3.0
- tfx >= 0.24.0
- tensorboard_plugin_fairness_indicators >= 0.24.0
- tensorflow_hub >= 0.9.0
- tensorflow_privacy >= 0.5.1
- pandas >= 1.1.2
- witwidget >= 1.17.0
- apache-beam >= 2.24.0
- google-cloud-core >= 1.4.2
@misc{Kezmann:2020,
Author = {Jan-Marcel Kezmann},
Title = {YourCookBook},
Year = {2020},
Publisher = {GitHub},
Journal = {GitHub repository},
Howpublished = {\url{https://github.com/JanMarcelKezmann/Building-ML-Pipelines-Notes}}
}
Project is distributed under MIT License.