This repository contains several exercises to practice machine learning algorithm through scikit-learn framework. All exercises come from the Linux magazine HS n°94.
This project use virtualenvwrapper to create a virtual environment for python.
$ sudo -H pip install virtualenvwrapper
$ mkdir ~/.virtualenvs
$ echo "export WORKON_HOME=~/.virtualenvs" >> ~/.bashrc
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bashrc
$ bash
$ mkvirtualenv machine_learning --python=/usr/bin/python3
$ workon machine_learning
$ pip install -r requirements.txt
In this section, we learned how to use linear regression, define "a" and "b" values to draw the linear equation and how to use spline to represent complex equations.
Firstly, we had to download several data sets.
- Download "players_stats.csv" => https://www.kaggle.com/drgilermo/nba-players-stats-20142015/
- Download "yellow_tripdata_2017_0*" => http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
- Download "04cars.dat.txt" => https://ww2.amstat.org/publications/jse/datasets/04cars.dat.txt
In this example, we will see a linear correlation between the height of a NBA player and his weight.
$ python linear_regression_nba_players_stats_2014_2015.py
$ python linear_regression_generated_1.py
Spline is a way to modelize complex equation that do not follow the pattern ax + b.
$ python linear_regression_splines_example.py
In this exercise, we will use spline to define the different time needed to go to JFK airport followind the same travel in taxi.
$ python linear_regression_taxi_nyc.py
$ python linear_regression_taxi_nyc_splines.py
In this section, we learned how to use PCA, normalized data and reduce variable dimensions.
In this exercise, we will use brute force to show all combination of Iris datasets.
$ python pca_brute_force.py
In this exercise, we will use a basic linear example to see how to reduce a 2d representation to 1D representation.
$ python pca_2d.py
In this exercise, we will use PCA and biplot methods to represent of one chart the IRIS dataset.
$ python pca_biplot_iris.py
In this exercise, we will see that unnormalized data could alter PCA analysis.
$ python pca_normalized.py
This project is distributed under the MIT licence.
To test the quality, run this commands :
$ pip install flake8 prospector
$ flake8
$ prospector -F -i dataset/
To fix a bug, open an issue in github and submit a pull request.