/ML-lib

An extensive machine learning library, made from scratch (Python).

Primary LanguagePythonMIT LicenseMIT

Overview

This is a machine learning library, made from scratch.

It uses:

  • numpy: for handling matrices/vectors
  • cvxopt: for convex optimization
  • networkx: for handling graphs in decision trees

It contains the following functionality:

  • Supervised Learning:
    • Regression
      • Linear Regression
      • Logistic Regression
      • Regularization
    • Support Vector Machines
      • Soft and hard margins
      • Kernels
    • Tree Methods
      • CART (classificiation and regression)
      • PRIM
      • AdaBoost
      • Gradient Boost
      • Random Forests
    • Kernel Methods
      • Nadaraya average
      • Local linear regression
      • Kernel density classification
    • Discriminant Analysis
      • LDA, QDA, RDA
    • Prototype Methods
      • KNN
      • LVQ
      • DANN
    • Perceptron
  • Unsupervised Learning
    • K means/mediods clustering
    • PCA
  • Model Selection and Validation

Examples

Examples are shown in two dimensions for visualisation purposes, however, all methods can handle high dimensional data.

Regression

  • Linear and logistic regression with regularization

Imgur

Imgur

Support Vector Machines

  • Support vector machines maximize the margins between classes

Imgur

  • Using kernels, support vector machines can produce non-linear decision boundries. The RBF kernel is shown below

Imgur

Imgur

  • An alternative learning algorithm, the perceptron, can linearly separate classes. It does not maximize the margin, and is severely limited.

SLiMG Image

Tree Methods

  • The library contains a large collection of tree methods, the basis of which are a decision trees for classification and regression

Imgur

These decision trees can be aggregated, and the library supports the following ensemble methods:

  • AdaBoosting
  • Gradient Boosting
  • Random Forests

Kernel Methods

Kernel methods estimate the target function by fitting seperate functions at each point using local smoothing of training data

  • Nadaraya–Watson estimation uses a local weighted average

Imgur

  • Local linear regression uses weighted least squares to locally fit an affine function to the data

Imgur

  • The library also supports kernel density estimation (KDE) of data which is used for kernel density classification

Imgur

Discriminant Analysis

  • Linear Discriminant Analysis creates decision boundries by assuming classes have the same covariance matrix.
  • LDA can only form linear boundries

SLiMG Image

  • Quadratic Discriminant Analysis creates deicion boundries by assuming classes have indepdent covariance matrices.
  • QDA can form non-linear boundries.

SLiMG Image

  • Regularized Discriminant Analysis uses a combination of pooled and class covariance matrices to determine decision boundries.

SLiMG Image

Prototype Methods

  • K-nearest neighbors determines target values by averaging the k-nearest data points. The library supports both regression and classification.

SLiMG Image

  • Learning vector quantization is a prototype method where prototypes are iteratively repeled by out-of-class data, and attracted to in-class data

SLiMG Image

  • Discriminant Adaptive Nearest Neighbors (DANN). DANN adaptively elongates neighborhoods along boundry regions.
  • Useful for high dimensional data.

SLiMG Image

Unsupervised Learning

  • K means and K mediods clustering. Partitions data into K clusters.

SLiMG Image

  • Principal Component Analysis (PCA) Transforms given data set into orthonormal basis, maximizing variance.

SLiMG Image