Coding Machine Learning Algorithms from the Ground Up

About

Welcome to the repository! This collection features handcrafted implementations of fundamental machine learning algorithms written entirely from scratch in Python. Dive into the core of machine learning as we build these algorithms step by step, providing clear insights into their inner workings. Whether you're a beginner looking to understand the fundamentals or an experienced practitioner seeking a deeper understanding, this repository is a resource for exploring the foundations of machine learning through code.

About
Table of Contents
Implementations
Snapshots
License
Support & Contact

Implementations

Linear Regression
Logistic Regression
K-Nearest Neighbour
K-Means
AdaBoost
Decision Trees - ID3
Decision Trees - CART:Classification - Implementation 1
Decision Trees - CART:Classification - Implementation 2
Decision Trees - CART:Regression
stay tuned for more! 🚀

Snapshots

Linear Regression:

Logistic Regression:

K-Nearest Neighbour:

K-Means:

AdaBoost - Binary Classification:

Using demo code from Sklearn AdaBoost,
decision boundaries across different iterations during training are plotted as follows:

As evidenced by the plots above, it is apparent that AdaBoost has a tendency to fit the training data nearly perfectly.
Thus, it's crucial to train AdaBoost only until it demonstrates effective generalization on validation data.
In other words, T (number of weak classifiers) is a hyperparameter critical for controlling model complexity and mitigating overfitting.

Decision Trees - ID3:

Graphical representation of the trained ID3 decision tree, by converting tree structure to dot format, is as follows:

Decision Trees - CART:Classification - Implementation 1:

The comparison sign with categorical variables, split based on thresholds, indicates alphabetical precedence,
where values appearing earlier in the alphabet are considered 'less than' those appearing later.

Decision Trees - CART:Classification - Implementation 2:

In this implementation, binary split on categorical values is achieved by considering
all possible ways to divide the categories in the column into two groups.

Decision Trees - CART:Regression :

Incorporated some of pruning methods including max_depth, min_impurity_decrease,
and min_leaves in the implementation.
For illustrative purposes, max_depth was constrained to 3 in this example:

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Support & Contact

Have questions, feedback, other implementations you would like to see here or just want to chat about machine learning? Feel free to email me or connect with me on LinkedIn.

YuganshG/ML-Algorithms-From-Scratch