Following the course of Machine Learning Foundation--- Case Study Approach by Washington University, I will finish all the cases used in the course by R, Scikit-learn and GraphLab (Later two are finished with Jupyter Notebook). Course Link: https://www.coursera.org/learn/ml-foundations/home/welcome
Note: Due to the computational capability of RStudio, several cases will not be finished with it.
In this module, we focused on using regression to predict a continuous value (house prices) from features of the house (square feet of living space, number of bedrooms,...). We also built an iPython notebook for predicting house prices, using data from King County, USA, the region where the city of Seattle is located.
In this assignment, we are going to build a more accurate regression model for predicting house prices by including more features of the house. In the process, we will also become more familiar with how the Python language can be used for data exploration, data transformations and machine learning. These techniques will be key to building intelligent applications.
Folder: Case 1: Regression
Tools: R, GraphLab and Sklearn (Jupyter Notebook)
Models Tried: Linear Regression, Random Forest, Gradient Boosting, Decision Trees and Lasso
In this module, we focused on classifiers, applying them to analyzing product sentiment, and understanding the types of errors a classifier makes. We also built an exciting IPython notebook for analyzing the sentiment of real product reviews.
In this assignment, we are going to explore this application further, training a sentiment analysis model using a set of key polarizing words, verify the weights learned to each of these words, and compare the results of this simpler classifier with those of the one using all of the words.
Folder: Case 2: Classification
Tools: GraphLab and Sklearn (Jupyter Notebook)
Models Tried: Logistic Regression, Random Forest, Gradient Boosting and Decision Trees
In this module, we focused on using nearest neighbors and clustering to retrieve documents that interest users, by analyzing their text. We explored two document representations: word counts and TF-IDF. We also built an iPython notebook for retrieving articles from Wikipedia about famous people.
Folder: Case 3: Clustering
Tools: GraphLab and Sklearn (Jupyter Notebook)
Models Tried: KNN
Priority: NLP (Natural Language Processing) in sklearn and graphlab
In this module, we focused on building recommender systems to find products, music and movies that interest users. We also built an exciting iPython notebook for recommending songs, which compared the simple popularity-based recommendation with a personalized model, and showed the significant improvement provided by personalization.
Folder: Case 4: Recommender-System
Tools: Graphlab and Sklearn (Jupyter Notebook)
Priorities: Try Graphlab's built-in Recommender Function
And: Break down the User-Item Collaborative Filtering Based Recommender with pandas and Sklearn
I am still studying the theory behind deep learning, so this part in Sklearn will take longer time to finish.
In this module, we focused on using deep learning to create non-linear features to improve the performance of machine learning. We also saw how transfer learning techniques can be applied to use deep features learned with one dataset to get great performance on a different dataset. We also built an iPython notebooks for both image retrieval and image classification tasks on real datasets.
Folder: Deep learning Image Retrieval
Tools: Graphlab (Mainly used for transfer learning from Imagenet winning solution.)