/MachineLearningCaseStudies

Following the course of Machine Learning Foundation--- Case Study Approach by Washington University, I replicate all the cases used in the course by R, Sklearn and Graphlab (R is not possible for some cases). Course Link: https://www.coursera.org/learn/ml-foundations/home/welcome

Primary LanguageJupyter Notebook

Machine Learning Case Studies

Following the course of Machine Learning Foundation--- Case Study Approach by Washington University, I will finish all the cases used in the course by R, Scikit-learn and GraphLab (Later two are finished with Jupyter Notebook). Course Link: https://www.coursera.org/learn/ml-foundations/home/welcome

Note: Due to the computational capability of RStudio, several cases will not be finished with it.

Case 1: Regression

Description

In this module, we focused on using regression to predict a continuous value (house prices) from features of the house (square feet of living space, number of bedrooms,...). We also built an iPython notebook for predicting house prices, using data from King County, USA, the region where the city of Seattle is located.

In this assignment, we are going to build a more accurate regression model for predicting house prices by including more features of the house. In the process, we will also become more familiar with how the Python language can be used for data exploration, data transformations and machine learning. These techniques will be key to building intelligent applications.

Folder: Case 1: Regression

Tools: R, GraphLab and Sklearn (Jupyter Notebook)

Models Tried: Linear Regression, Random Forest, Gradient Boosting, Decision Trees and Lasso

Case 2: Classification

Description

In this module, we focused on classifiers, applying them to analyzing product sentiment, and understanding the types of errors a classifier makes. We also built an exciting IPython notebook for analyzing the sentiment of real product reviews.

In this assignment, we are going to explore this application further, training a sentiment analysis model using a set of key polarizing words, verify the weights learned to each of these words, and compare the results of this simpler classifier with those of the one using all of the words.

Folder: Case 2: Classification

Tools: GraphLab and Sklearn (Jupyter Notebook)

Models Tried: Logistic Regression, Random Forest, Gradient Boosting and Decision Trees

Case 3: Clustering (Similarity) Analysis--Document Retrieval

Description

In this module, we focused on using nearest neighbors and clustering to retrieve documents that interest users, by analyzing their text. We explored two document representations: word counts and TF-IDF. We also built an iPython notebook for retrieving articles from Wikipedia about famous people.

Folder: Case 3: Clustering

Tools: GraphLab and Sklearn (Jupyter Notebook)

Models Tried: KNN

Priority: NLP (Natural Language Processing) in sklearn and graphlab

Case 4: (Personalized) Recommender System

Description

In this module, we focused on building recommender systems to find products, music and movies that interest users. We also built an exciting iPython notebook for recommending songs, which compared the simple popularity-based recommendation with a personalized model, and showed the significant improvement provided by personalization.

Folder: Case 4: Recommender-System

Tools: Graphlab and Sklearn (Jupyter Notebook)

Priorities: Try Graphlab's built-in Recommender Function

And: Break down the User-Item Collaborative Filtering Based Recommender with pandas and Sklearn

Case 5: Image Retrieval: Deep Learning

I am still studying the theory behind deep learning, so this part in Sklearn will take longer time to finish.

Description

In this module, we focused on using deep learning to create non-linear features to improve the performance of machine learning. We also saw how transfer learning techniques can be applied to use deep features learned with one dataset to get great performance on a different dataset. We also built an iPython notebooks for both image retrieval and image classification tasks on real datasets.

Folder: Deep learning Image Retrieval

Tools: Graphlab (Mainly used for transfer learning from Imagenet winning solution.)