Machine Learning by Stanford University

Below are the problem sets and courseworks, written in Matlab, from Stanford University's Machine Learning course on Coursera.

Table of Content

Problem Set #1: Linear Regression and Gradient Descent
Problem Set #2: Logistic Regression
Problem Set #3: Multi-class Classification and Neural Networks
Problem Set #4: Neural Network Learning
Problem Set #5: Regularised Linear Regression and Bias v.s. Variance
Problem Set #6: Support Vector Machines
Problem Set #7: K-means Clustering and Principal Component Analysis
Problem Set #8: Anomaly Detection and Recommender Systems

Problem Set #1: Linear Regression and Gradient Descent

Linear Regression with One Variable

This involves implementing a linear regression with one variable, using profits and populations of cities, in which food trucks have been in operations, to predict profits for a new food truck given the population in its operating city, using the gradient descent algorithm.

Training data with linear regression fitted by gradient descent:

Surface and contour plots of the cost function J(θ):

Linear Regression with Multiple Variables

This involves implementing a linear regression with two variables, using the house size in square feet and number of bedrooms from historical house sales data, to predict the price for a new house given its size and number of bedrooms, using the gradient descent algorithm.

With a learning rate (α) of 0.01, the cost function J(θ) converges after a number of iteration using gradient descent:

Problem Set #2: Logistic Regression

Logistic Regression

This involves implementing a logistic regression with two variables, using historical students' exams scores in two different exams and their university admission results, to predict whether a student will be admitted to university given exam scores, using Octave/MATLAB's fminunc function.

Training data with decision boundary fitted by the fminunc function:

Training accuracy is 89%.

Logistic Regression Regularised

This involves implementing a regularised logistic regression with two variables, using historical microchip test results, in two different tests and their quality assurance test results from a fabrication plant, to predict whether a microchip will pass quality assurance given test results, using Octave/MATLAB's fminunc function.

Training data with decision boundary fitted by the fminunc function:

Training accuracy is 83%.

Training data with decision boundary fitted by the fminunc function using a regularisation parameter (λ) of 0 (no regularisation/overfitting) and of 100 (underfitting):

Problem Set #3: Multi-class Classification and Neural Networks

This involves implementing one-vs-all logistic regression, and neural networks of one hidden layer, to recognise handwritten digits from the MNIST handwritten digit dataset, using Octave/MATLAB's fmincg function, and to compare the performance between the two algorithms.

Handwritten digits visualised:

Training accuracy is 95% with one-vs-all logistic regression whereas training accuracy is 97.52% with one-vs-all neural networks.

Problem Set #4: Neural Network Learning

This involves implementing the backpropagation algorithm for neural networks of one hidden layer, and applying it to recognise handwritten digits, learning from a subset of the MNIST handwritten digit dataset of 5,000 samples on 50 iterations, using Octave/MATLAB's fmincg function.

Handwritten digits and the representation captured by the hidden layer visualised:

Training accuracy is 95.06%.

Problem Set #5: Regularised Linear Regression and Bias vs Variance

This involves implementing regularised linear regression, using historical record of change of water level in a reservoir and the amount of water flowing out of a dam, to predict water outflow given the reservoir's water level change, using Octave/MATLAB's fmincg function.

The dataset will be randomly divided into 3 parts:

Training set: for regression learning
Cross-validation set: for determining the regularisation parameter
Test set: for evaluating the regression performance

Regularised Linear Regression

Training data with regularised linear regression fitted by the fmincg function (high bias) and the training and cross-validation errors as a function of training set size:

Regularised Polynomial Regression

Training data with regularised polynomial regression fitted by the fmincg function using a regularisation parameter (λ) of 1 and the training and cross-validation errors as a function of training set size:

Training data with regularised polynomial regression fitted by the fmincg function using a regularisation parameter (λ) of 0 (no regularisation/overfitting/high variance) and the training and cross-validation errors as a function of training set size:

Training data with regularised polynomial regression fitted by the fmincg function using a regularisation parameter (λ) of 100 (underfitting/high bias) and the training and cross-validation error as a function of training set size:

Training and cross-validation error as a function of the regularisation parameter (λ):

Problem Set #6: Support Vector Machines

Support Vector Machines (SVMs) with a C Parameter and a Gaussian Kernel

This involves implementing Support Vector Machines (SVMs) with a C parameter and a Gaussian kernel, to draw decision boundary, and using the cross-validation dataset to determine the optimal C parameter, and the bandwidth parameter (σ) for the Gaussian kernel.

SVM linear decision boundary with the C parameter of 1 and of 100 on example dataset #1:

SVM non-linear decision boundary with a Gaussian kernel on example dataset #2:

SVM non-linear decision boundary with a C parameter, and a bandwidth parameter (σ) for the Gaussian kernel, that minimise prediction error on example dataset #3:

Support Vector Machines (SVMs) Spam Classifier

This involves implementing Support Vector Machines (SVMs) with a C parameter and a Gaussian kernel, to build a spam classifier, learning from a subset of spam emails made available in SpamAssassin Public Corpus.

SVM spam predictor, using a vocabulary list of 1899 words that occur at least 100 times in the spam corpus, has a training accuracy of 99.85% and a test accuracy of 98.80%.

Top predictor words for spam are:

Word	Weight
our	(0.491561)
click	(0.467062)
remov	(0.421572)
guarante	(0.387703)
visit	(0.366002)
basenumb	(0.345912)
dollar	(0.323080)
will	(0.263241)
price	(0.262449)
pleas	(0.259879)
nbsp	(0.254624)
most	(0.253783)
lo	(0.251302)
ga	(0.248725)
hour	(0.241374)

Problem Set #7: K-means Clustering and Principal Component Analysis

K-means Clustering

This involves implementing the K-means clustering algorithm and applying it to compress an image.

Moving paths of centroids using K-means clustering with iteration steps of 10 and a number of cluster of 3 on an example dataset:

Bird image of 128×128 resolution in 24-bit colour requires 24 bits for each pixel, and results in a size of 128 × 128 × 24 = 393,216 bits.

If however, using K-means clustering to identify 16 principal colours, and represent the image using only the 16 principal colours, it requires an extra overhead colour dictionary of 24 bits, for each of the 16 principal colours, yet each pixel requires only 4 bits.

The final number of bits used is therefore 16 × 24 + 128 × 128 × 4 = 65,920 bits, which corresponds to a compression factor of 6:

Each pixel of the bird image, plotted with RGB values on a different axis each, grouped in 16 principal colour clusters:

Principal Component Analysis

This involves using principal component analysis to find a low-dimensional representation of face images.

Dimensionality reduction with principal component analysis on an example dataset:

Face image of 32x32 resolution, in grayscale, have 32 x 32 = 1,024 pixels or dimensions. Using principal component analysis to reduce the dimensions from 1,024 to 100, reduces the dataset size by a factor of 10, while maintaining the general structure and appearance of the faces, despite forgoing the fine details:

Problem Set #8: Anomaly Detection and Recommender Systems

Anomaly Detection with Gaussian Distribution and F-score

This involves implementing an anomaly detection algorithm, with Gaussian distribution and a threshold (ε) optimised by F-score, on a cross-validation set, and applying it to detect failing servers on a network using their throughput (mb/s) and response latency (ms).

Anomalies with a probability of occurrence lower than the threshold (ε), which is set to maximise F-score on a cross-validation set, fitted against a Gaussian distribution:

Recommender Systems using Collaborative Filtering

This involves using collaborative filtering to build a movie recommender system from a subset of the MovieLens 100k dataset from GroupLens Research of 943 users and 1682 movies.

Movie rating dataset of 943 users and 1682 movies visualised:

Provides a user with movie ratings of:
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)

Collaborative filtering trained on 100 iterations recommends:
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
Predicting rating 5.0 for movie Star Kid (1997)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)
Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie Someone Else's America (1995)

kurtcms/standford-machine-learning