/Machine-Learning-From-Scratch

Popular machine learning algorithms, including GBDT, SVM and NN, implemented with simple python code.

Primary LanguagePythonMIT LicenseMIT

Machine Learning From Scratch

Gradient Boosting Decision Tree, Support Vector Machine and Neural Network are arguably the three best machine learning algorithms that has gone through the test of time.

This project implements the three algorithms with simple and neat python code. Those toy codes may not compare to other mature packages such as xgboost and sklearn in terms of speed and memory consumption, but could help illustrate how those algorithms work.

Dependence

All three algorithms are implemented in Python 3.6. All three algorithms are build from scratch, which means that the implementation is purely based on numpy, and there is no dependence on any other machine learning package.

Construction in Progress

The implementation of GBDT and NN has been finished, while SVM is still construction in progress.

Tutorial of GBDT and NN is provided below.

GBDT

This implementation of GBDT supports most of the core features of xgboost. Briefly, it supports:

  • Built-in loss: Mean squared loss for regression task and log loss for classfication task.
  • Customized loss: Other loss are also supported. User should provide the link function, the gradient, and the hessian.
  • Hessian information: It uses Newton Method for boosting, thus makes full use of the second-order derivative information.
  • Regularization: lambda and gamma, as in xgboost.
  • Multi-processing: It uses the python Pool module for multi-processing.

To keep the code neat, some features of xgboost are not implemented. For example, it does not handle missing value, and randomness is not supported.

A quick start is provided below.

Import the module

from gbdt import GBDT

Initialize model

model = GBDT(n_threads=None,loss='mse',max_depth=3,min_sample_split=10,reg_lambda=1,gamma=0,
learning_rate=0.1,n_estimators=100)
  • n_threads: Number of threads for multiprocessing. None to use all.
  • loss: Loss function for gradient boosting. 'mse' is mean squared error for regression task and 'log' is log loss for classification task. Pass a child class of the loss class to use customized loss. See source code for details.
  • max_depth: The maximum depth of a tree.
  • min_sample_split: The minimum number of samples required to further split a node.
  • reg_lambda: The regularization coefficient for leaf score, also known as lambda.
  • gamma: The regularization coefficient for number of tree nodes, also know as gamma.
  • learning_rate: The learning rate of gradient boosting.
  • n_estimators: Number of trees.

Train

model.fit(train,target)

All inputs should be numpy arrays. train should be 2D array and target should be 1D array.

Predict

model.predict(test)

Return predictions as numpy array.

Customized loss

Define a class that inheritates the loss class (see source code), which specifies the link function, the gradients and the hessian.

class customized_loss(loss):
	def link(self,score):
		return 1/(1+np.exp(-score))
	def g(self,true,score):
		pred=self.link(score)
		return pred-true
	def h(self,true,score):
		pred=self.link(score)
		return pred*(1-pred)

And the class could be passed when initializing the model.

model = GBDT(loss=customized_loss,learning_rate=0.1,n_estimators=100)

MLP

To implement a full-featured deep learning framework is rather complicated. Instead of doing that, I write a simple Multi-layer Perceptron (MLP) with one hidden layer. This implementation of MLP supports:

  • Loss: Mean squared loss for regression task and log loss for classfication task.
  • Activation: relu, sigmoid and linear activation are supported natively.
  • Momentum: Batch gradient descent with momentum optimizer.
  • Regularization: L2 regularization, also known as weight decay.

A quick start is provided below.

Import the module

from MLP import MLP

Initialize model

model=MLP(n_hidden_units=100,batch_size=200,n_epochs=200,learning_rate=0.01,momentum=0.9,weight_decay=0.0001,loss='mse')
  • n_hidden_units: Number of units in the hidden layer.
  • batch_size: Number of data points used in each gradient step.
  • n_epochs: Number of epochs. Note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
  • learning_rate: The learning rate of gradient descent.
  • momentum: Momentum for gradient descent update. (Between 0 and 1.)
  • weight_decay: Coeffecients for L2 regularization. (Also known as weight decay.)
  • activation: Activation function for the hidden layer.'relu' for rectified linear units, 'logistic' for sigmoid activation and 'linear' for linear activation.
  • loss: Loss function,'mse' for regression task and 'log_loss' for classfication task.

Train

model.fit(train,target)

All inputs should be numpy arrays. train should be 2D array and target should be 1D array.

Predict

model.predict(test)

Return predictions as numpy array.