Welcome to the repository! This collection features handcrafted implementations of fundamental machine learning algorithms written entirely from scratch in Python. Dive into the core of machine learning as we build these algorithms step by step, providing clear insights into their inner workings. Whether you're a beginner looking to understand the fundamentals or an experienced practitioner seeking a deeper understanding, this repository is a resource for exploring the foundations of machine learning through code.
- Linear Regression
- Logistic Regression
- K-Nearest Neighbour
- K-Means
- AdaBoost
- Decision Trees - ID3
- Decision Trees - CART:Classification - Implementation 1
- Decision Trees - CART:Classification - Implementation 2
- Decision Trees - CART:Regression
- stay tuned for more! 🚀
Using demo code from Sklearn AdaBoost,
decision boundaries across different iterations during training are plotted as follows:
As evidenced by the plots above, it is apparent that AdaBoost has a tendency to fit the training data nearly perfectly.
Thus, it's crucial to train AdaBoost only until it demonstrates effective generalization on validation data.
In other words, T (number of weak classifiers) is a hyperparameter critical for controlling model complexity and mitigating overfitting.
Graphical representation of the trained ID3 decision tree, by converting tree structure to dot format, is as follows:
The comparison sign with categorical variables, split based on thresholds, indicates alphabetical precedence,
where values appearing earlier in the alphabet are considered 'less than' those appearing later.
In this implementation, binary split on categorical values is achieved by considering
all possible ways to divide the categories in the column into two groups.
Incorporated some of pruning methods including max_depth, min_impurity_decrease,
and min_leaves in the implementation.
For illustrative purposes, max_depth was constrained to 3 in this example:
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Have questions, feedback, other implementations you would like to see here or just want to chat about machine learning? Feel free to email me or connect with me on LinkedIn.