ml-project-2

EPFL's Pattern Classification and Machine Learning second course project

Team members

The project was designed by Prof. Emtiyaz & TAs.

This project contained two tasks: people detection in images, and a song recommender system from listening counts data.

analysis: simple data exploratory analysis scripts we used to get to know the datasets better.
src:
- detection: code for the people detection dataset. We experimented with Gaussian Processes, Neural Networks, PCA, SVM and Random Forests
- recommendation: code for the song recommendation dataset. We experimented with various feature extractions, ALS-WR, linear regression, K-means clustering, Gaussian Mixture Model clustering, Top-N recommendation and the Pearson similarity measure.
toolbox: place the dependencies there. Our code relies on the DeepLearn toolbox, Piotr toolbox, and the VBGM script (see Tools section).
report: project report (written in LaTeX). Contains references to related papers which were helpful for the project.
data and results: input and output data (provided as Matlab .mat files).
test: simple test scripts which were provided for us to check the output format of our predictions.

Generic k-fold Cross Validation
Support Vector Machine
Gaussian process and several kernels
K-means clustering
Gaussian Mixture Model and EM algo
Principal Components Analysis (as a low-rank approximation) using alternating least squares
Neural Networks (implementation from the DeepLearn toolbox)
Generic ML method comparison function (for each method, plot achieved test error & stability with a boxplot)

Recall we must achieve both weak (new ratings for existing users) and strong (entirely new users) prediction.

songPred.mat contains the two matrices Ytest_weak_pred (size 1774x15082) and Ytest_strong_pred (size 93x15082)
personPred.mat contains a vector 'Ytest_score' (8743x1) with the prediction score for each test sample