This is the R version assignments of the popular online machine learning course on Coursera website.
To download lecture videos visit the course website:
This repo provides the starter code to solve the assignment in R statistical software; the completed assignments are available in the Solutions
folder.
Do these steps to complete the assignments:
- View the lectures
- Read the instructions (pdf)
- Use the Starter folder and fill the parts of the code that is written
"YOUR CODE HERE"
- If you couldn't solve yourself, get help from the
Solutions
folder - Submit
In order to produce similar results and plots to Octave/Matlab, you should install a few packages:
-
rgl
package is used to produce the 3D scatter plots and surface plots in the exercises. -
SnowballC
:portStemmer
function in this package has the same role of theportStemmer.m
. -
raster
package is used to produce the plot of the bird in exercise 7. -
jsonlite
andhttr
packages are needed for submission. -
pinv.R
: Theginv
function, generalized inverse, inMASS
package doesn't produce the same exact result of the Matlabpinv
(pseudo-inverse).pinv.R
is the modified version of MASSginv
to produce the same effect of the MATLABpinv
. For more info see the stackoverflow discussion -
lbfgsb3_.R
: Certain optimization tasks could only be solved usinglbfgsb3
package, yet there are a few bugs in this package. The purpose oflbfgsb3_.R
is to address these bugs; it is used for exercises 4 and 8. Beware thatfmincg
/fminunc
optimization functions in Matlab takes one function as input and computes cost and gradient simultaneously. However, cost and gradient functions MUST be supplied intooptim
orlbfgsb3
functions individually.
Before starting to code, install the following packages:
install.packages(c('rgl','lbfgsb3','SnowballC','raster','jsonlite', 'httr'))
Note that you don't have to do anything with what is mentioned above, just be informed.
After completing each assignment, source
the submit.r
and type submit()
in the R console.
I submitted the solutions to Coursera for testing and the scores were 100%. Please report any problem with submission.
- Linear regression, cost function and normalization
- Gradient descent and advanced optimization
- Multiple linear regression and normal equation
- Logistic regression, decision boundary and multi-class classification
- Over-fitting and Regularization
- Neural Network non-linear classification
- Model validation, diagnosis and learning curves
- System design, prioritizing and error analysis
- Support vector machine (SVM), large margin classification and SVM kernels (linear and Gaussian)
- K-Means clustering
- Principal component analysis (PCA)
- Anomaly detection, supervised learning
- Recommender systems, Collaborative filtering
- Large scale machine learning, stochastic and mini-batch gradient descent, on-line learning, map reduce
A few screen-shots of the plots produced in R:
![Learning Curves](http://faridcher.github.io/uploads/ml-course/Snapshots/Learning Curve.png)