A Hands-on Workshop series in Machine Learning
Time: 5:45-7:45 pm on 7 consecutive Wednesdays starting Oct 02 to Nov 13, 2019
Venue: Aviation Room, Hoch-Shanahan Dining Commons, HMC
The learning material along with the solutions can be downloaded from here. The lecture capture is made available at this link where the recordings are uploaded after each session.
Please download and install Anaconda with Python 3.7 version in your laptop. If you are new to Jupyter notebook, please find the instructions for using it in the very first session.
The workshop series is designed with a focus on the practical aspects of machine learning using real-world datasets and the tools in the Python ecosystem. It is targeted towards complete beginners familiar with Python but is also designed adaptively so that you will be challenged even if you have some familiarity with machine learning tools.
You will learn the minimal but most useful tools for exploring datasets using pandas quickly and then move on to the conventional machine learning algorithms and other related concepts that comes in handy for all models including neural networks. The neural networks will be introduced gently from the fourth session onwards and you will learn some more involved architectures such as Convolution Neural Networks (CNN) and apply them to real-world datasets. The sessions will be a good mix of theory explained intuitively in a simplified manner and hands-on exercises.
Each session of the workshop will build on the previous ones. It is important that you attend all the sessions of the series for it to be useful. The learning material will be made available in this Github repository a few minutes before each session. The solutions to the hands-on exercises will be uploaded in the same repository after each session.
Pre-requisites:
- The workshop will cover the data science and deep learning tools in the Python ecosystem from the scratch. Some familiarity with Python is a pre-requisite. If you have a grip on the basics of coding in some other language such as Javascript, that should suffice too.
- Basics of Probability and Statistics
- Basics of Calculus
- Basics of Linear Algebra
What to bring:
Please bring your laptop fully charged and with WiFi connection. Please download and install Anaconda with Python 3.7 version in your laptop ahead of the workshop.
Topics to be covered:
Sessions 1:
- Introduction to Jupyter Notebook
- Pandas dataframes as a data structure
- Indexing and slicing data frames
- Data exploration
- Basic statistical plots using
matplotlib
andseaborn
- Detecting and filling missing values
- Regular expressions for text mining
- Encoding categorical variables
- Correlation between variables
If you are using Colab, please copy the notebook to your Google Drive first so that you can save the changes you will make. The option Copy to Drive
is at the top bar of the Colab notebook.
Sessions 2:
- More on
pandas
- Groupby operations - Machine Learning algorithms: Decision Trees and Random Forest using
scikit-learn
- Underfitting and Overfitting to the training dataset; Model cross-validation
- Mini-project: Building a prediction model using Election Dataset by American National Election Study using the above tools and concepts
Sessions 3:
- Machine Learning algorithms:
- Logistic Regression
- Support Vector Machines (SVM)
- k-Nearest Neighbors (k-NN)
- Application of the above classification algorithms using
scikit-learn
on Election Dataset by American National Election Study - Classification metrices:
- Confusion matrix
- Decision Threshold
- Precision/Recall
- F1-score
- Area Under ROC curve
Session 4:
- Machine Learning algorithm: Linear Regression
- Building the intuition of the training process and architecture of neural networks
- Multi-Layer Perception: Forward and Backward propagation
- A primer on
Keras
- Training a neural network on Election Dataset by American National Election Study
Session 5:
- Vanishing gradients and exploding gradients in deep networks
- Activation functions
- Weight Initialization
- Regularization
- Dropout
- Tuning other hyper-parameters such as learning rate, number of epochs, etc.
- Application of the above concepts on Election Dataset by American National Election Study
Session 6:
- Image preprocessing
- Feature extraction using convolution filters
- Convolution Neural Networks architecture (CNN)
- Training a CNN model on CIFAR-10 dataset
Session 7:
- A brief overview of neural network frameworks/architectures used in Computer Vision
- Transfer Learning
- Dimensionality reduction techniques such as PCA, LDA, etc.
- An overview of computing and dataset resources
- Using Anaconda as a package management and environment management tool
- A brief overview of Kaggle
- A small project
- Open questions and discussion
The topics may be shuffled around and added/eliminated without notice, in view of time-constraint for the two hour sessions, as the learning material is being build on.
This page will be frequently updated with more information.