"Talk is cheap. Show me the code." - Linus Torvalds
This course introduces the basic concepts and techniques of machine learning and covers most commonly used models for predictive analytics. The end-to-end workflow for typical machine learning projects is illustrated via multiple business programming cases and Kaggle competitions. If time permits, deep learning techniques are also introduced. This course is programming intensive using Python 3 and popular packages, such as Jupyter, Numbpy, Pandas, Matplotlib, Seaborn, and Scikit-Learn.
Key Topics:
- Machine Learning Overview
- Toolkit Bootcamp (python, anaconda, jupyter, numpy, pandas, matplotlib, seanborn, scikit-Learn)
- Exploratory Data Analysis (EDA)
- Data Preprocessing (missing data, outliers, feature encoding, pipeline, etc.)
- Model Training, Evaluation, and Tuning
- Classification (Decision Tree, Logistic Regression)
- Regression (Linear Regression, Gradient Descent, SVM)
- Ensemble Learning (Random Forest, Gradient Boosting)
- Clustering (K-Means)
- Dimensionality Reduction
- Data Science App (Streamlit)
Professor Harry J. Wang: check out my website at harrywang.me
We refer to the following technical books in this course:
- (Free) Python Data Science Handbook by Jake VanderPlas
- (Free) Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. I re-created the Table of Symbols in Latex using Jupyter Notebook, which you can access at Google Colab and Overleaf.
I also recommend reading the following business books:
- Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, and Avi Goldfarb