/ML-Algorithms-From-Scratch

Delve into the core principles of Machine Learning Algorithms, featuring Python implementations painstakingly crafted from the ground up, offering unparalleled insights into fundamental algorithms.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Coding Machine Learning Algorithms from the Ground Up

About

Welcome to the repository! This collection features handcrafted implementations of fundamental machine learning algorithms written entirely from scratch in Python. Dive into the core of machine learning as we build these algorithms step by step, providing clear insights into their inner workings. Whether you're a beginner looking to understand the fundamentals or an experienced practitioner seeking a deeper understanding, this repository is a resource for exploring the foundations of machine learning through code.

Table of Contents

  1. About
  2. Table of Contents
  3. Implementations
  4. Snapshots
  5. License
  6. Support & Contact

Implementations

Snapshots

Linear Regression:

linear-regression-cost-plot linear-regression-line

Logistic Regression:

logistic-regression-cost-plot logistic-regression-decision-boundary

K-Nearest Neighbour:

knn-choosing-k

K-Means:

kmeans-elbow-method kmeans-progress-plot

AdaBoost - Binary Classification:

Adaboost-errorplot Adaboost-upperbounderrorplot

Using demo code from Sklearn AdaBoost,
decision boundaries across different iterations during training are plotted as follows:

Adaboost-T=200 Adaboost-T=1000 Adaboost-T=1600 Adaboost-T=2000

As evidenced by the plots above, it is apparent that AdaBoost has a tendency to fit the training data nearly perfectly.
Thus, it's crucial to train AdaBoost only until it demonstrates effective generalization on validation data.
In other words, T (number of weak classifiers) is a hyperparameter critical for controlling model complexity and mitigating overfitting.

Decision Trees - ID3:

Graphical representation of the trained ID3 decision tree, by converting tree structure to dot format, is as follows:

id3_tree

Decision Trees - CART:Classification - Implementation 1:

cart_thresh
The comparison sign with categorical variables, split based on thresholds, indicates alphabetical precedence,
where values appearing earlier in the alphabet are considered 'less than' those appearing later.

Decision Trees - CART:Classification - Implementation 2:

cart_thresh
In this implementation, binary split on categorical values is achieved by considering
all possible ways to divide the categories in the column into two groups.

Decision Trees - CART:Regression :

Incorporated some of pruning methods including max_depth, min_impurity_decrease,
and min_leaves in the implementation.
For illustrative purposes, max_depth was constrained to 3 in this example:
cart_regression

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Support & Contact

Have questions, feedback, other implementations you would like to see here or just want to chat about machine learning? Feel free to email me or connect with me on LinkedIn.