These are minimal implementations of most basic machine learning algorithms taught in Columbia Machine Learning on edx.org. All abstractions are strictly avoided to highlight the direct translation from mathematical formulations to algorithm implementations. This is only for educational purpose and by no means for production.
Implemented algorithms are:
- Regression
- Least squares
- Ridge regression
- Gaussian process
- Classification
- Gaussian Bayes
- Naive Bayes (Poisson)
- Decision tree
- Support vector machine
- Clustering
- k-mean
- Gaussian mixture model
- classifier.py, regressor.py, clustering.py => 3 base classes for classification, regression and clustering
- ridge_regressor.py, gaussian_regressor.py => implements Least Square, Ridge Regression and Gaussian Process Regression
- gaussian_bayes.py, naive_bayes.py, decision_tree.py, svm.py => implements the respective classification methods
- k_mean.py, gmm.py => implements the respective clustering method
This is a minimal working example for decision tree.
cd basic-ml-algorithms
from sklearn import datasets
from sklearn.model_selection import train_test_split
from decision_tree import DecisionTree
dataset = datasets.load_iris()
X = dataset.data[:,:3]
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
classifier = DecisionTree()
classifier.fit(X_train, y_train)
print 'Accuracy:', classifier.eval(X_test, y_test)
The notebook examples.ipynb contains demonstrations of all algorithms.