This project contains my attempts to implement machine learning algorithms with Python. These implementations are not supposed to be optimized.
Project was written with Python 3.5.2
There are some dependencies such as Numpy and matplotlib. They can be installed via pip:
pip install -r requirements.txt
This project contains different solutions, every one is located in separate folder. There are run.py files which can be executed to see algorithms in action:
python run.py
To test implementation Concrete Compressive Strength Data Set is used. Data set is divided on training and test subsets. 2D projection is used to visually show results. Also 1D synthetic data is generated to demonstrate more clearly how predictions will be looks like.
Results: Normal equation and gradient descent works well on small data sets. Normal equation is quite accurate but there are some problems with performance on very large data sets with many features. Gradient descent provides approximate solution but it can be more effective on large data sets.
There is some randomly generated synthetic data to test algorithm with linear and polynomial models. Real data set: Haberman's Survival Data Set
Results: With some tweaking of params this implementation can have 75-80% accuracy on real dataset. Main problem is that implementation of optimization algorithm (gradient descent) if falling in local minimum so another optimization is required for better results. Regularization tests shows how underfitting and overfitting can affect final results.