Machine Learning Bookcamp
The code from the Machine Learning Bookcamp book
Useful links:
- https://mlbookcamp.com: supplimentary materials
- https://datatalks.club: the place to talk about data (and the book: join the
#ml-bookcamp
channel to ask questions about the book and report any problems)
Machine Learning Zoomcamp
Machine Learning Zoomcamp is a course based on the book
- It's online and free
- You can join at any moment
- More information in the course-zoomcamp folder
Reading Plan
Chapters
Chapter 1: Introduction to Machine Learning
- Understanding machine learning and the problems it can solve
- CRISP-DM: Organizing a successful machine learning project
- Training and selecting machine learning models
- Performing model validation
No code
Chapter 2: Machine Learning for Regression
- Creating a car-price prediction project with a linear regression model
- Doing an initial exploratory data analysis with Jupyter notebooks
- Setting up a validation framework
- Implementing the linear regression model from scratch
- Performing simple feature engineering for the model
- Keeping the model under control with regularization
- Using the model to predict car prices
Code: chapter-02-car-price/02-carprice.ipynb
Chapter 3: Machine Learning for Classification
- Predicting customers who will churn with logistic regression
- Doing exploratory data analysis for identifying important features
- Encoding categorical variables to use them in machine learning models
- Using logistic regression for classification
Code: chapter-03-churn-prediction/03-churn.ipynb
Chapter 4: Evaluation Metrics for Classification
- Accuracy as a way of evaluating binary classification models and its limitations
- Determining where our model makes mistakes using a confusion table
- Deriving other metrics like precision and recall from the confusion table
- Using ROC and AUC to further understand the performance of a binary classification model
- Cross-validating a model to make sure it behaves optimally
- Tuning the parameters of a model to achieve the best predictive performance
Code: chapter-03-churn-prediction/04-metrics.ipynb
Chapter 5: Deploying Machine Learning Models
- Saving models with Pickle
- Serving models with Flask
- Managing dependencies with Pipenv
- Making the service self-contained with Docker
- Deploying it to the cloud using AWS Elastic Beanstalk
Code: chapter-05-deployment
Chapter 6: Decision Trees and Ensemble Learning
- Predicting the risk of default with tree-based models
- Decision trees and the decision tree learning algorithm
- Random forest: putting multiple trees together into one model
- Gradient boosting as an alternative way of combining decision trees
Code: chapter-06-trees/06-trees.ipynb
Chapter 7: Neural Networks and Deep Learning
- Convolutional neural networks for image classification
- TensorFlow and Keras — frameworks for building neural networks
- Using pre-trained neural networks
- Internals of a convolutional neural network
- Training a model with transfer learning
- Data augmentations — the process of generating more training data
Code: chapter-07-neural-nets/07-neural-nets-train.ipynb
Chapter 8: Serverless Deep Learning
- Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
- Deploying deep learning models with AWS Lambda
- Exposing the Lambda function as a web service via API Gateway
Code: chapter-08-serverless
Chapter 9: Kubernetes and Kubeflow
Kubernetes:
- Understanding different methods of deploying and serving models in the cloud.
- Serving Keras and TensorFlow models with TensorFlow-Serving
- Deploying TensorFlow-Serving to Kubernetes
Code: chapter-09-kubernetes
Kubeflow:
- Using Kubeflow and KFServing for simplifying the deployment process
Code: chapter-09-kubeflow
Articles from mlbookcamp.com:
Appendices
Appendix A: Setting up the Environment
- Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
- Running a Jupyter Notebook service from a remote machine
- Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
- Creating an EC2 machine on AWS using the web interface and the command-line interface
Code: no code
Articles from mlbookcamp.com:
Appendix B: Introduction to Python
- Basic python syntax: variables and control-flow structures
- Collections: lists, tuples, sets, and dictionaries
- List comprehensions: a concise way of operating on collections
- Reusability: functions, classes and importing code
- Package management: using pip for installing libraries
- Running python scripts
Code: appendix-b-python.ipynb
Articles from mlbookcamp.com:
Appendix C: Introduction to NumPy and Linear Algebra
- One-dimensional and two-dimensional NumPy arrays
- Generating NumPy arrays randomly
- Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
- Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
- Finding the inverse of a matrix and solving the normal equation
Code: appendix-c-numpy.ipynb
Articles from mlbookcamp.com:
Appendix D: Introduction to Pandas
- The main data structures in Pandas: DataFrame and Series
- Accessing rows and columns of a DataFrame
- Element-wise and summarizing operations
- Working with missing values
- Sorting and grouping
Code: appendix-d-pandas.ipynb
Appendix E: AWS SageMaker
- Increasing the GPU quota limits
- Renting a Jupyter notebook with GPU in AWS SageMaker