/machine_learning

This repository contains several exercises to practice machine learning algorithm through scikit-learn framework

Primary LanguagePython

Code Climate

Machine learning exercises

Goals of this repository

This repository contains several exercises to practice machine learning algorithm through scikit-learn framework. All exercises come from the Linux magazine HS n°94.

Main installation

This project use virtualenvwrapper to create a virtual environment for python.

$ sudo -H pip install virtualenvwrapper
$ mkdir ~/.virtualenvs
$ echo "export WORKON_HOME=~/.virtualenvs" >> ~/.bashrc
$ echo "source /usr/local/bin/virtualenvwrapper.sh" >> ~/.bashrc
$ bash
$ mkvirtualenv machine_learning --python=/usr/bin/python3
$ workon machine_learning
$ pip install -r requirements.txt

Linear regression

In this section, we learned how to use linear regression, define "a" and "b" values to draw the linear equation and how to use spline to represent complex equations.

Dataset

Firstly, we had to download several data sets.

NBA player size and weight correlation

In this example, we will see a linear correlation between the height of a NBA player and his weight.

$ python linear_regression_nba_players_stats_2014_2015.py
$ python linear_regression_generated_1.py

linear_regression_nba_players_stats_2014_2015

linear_regression_generated_1

Spline generated example

Spline is a way to modelize complex equation that do not follow the pattern ax + b.

$ python linear_regression_splines_example.py

linear_regression_splines_example

Spline usage to modelize traffic jam to JFK airport

In this exercise, we will use spline to define the different time needed to go to JFK airport followind the same travel in taxi.

$ python linear_regression_taxi_nyc.py
$ python linear_regression_taxi_nyc_splines.py

linear_regression_taxi_nyc

linear_regression_taxi_nyc_splines_1

linear_regression_taxi_nyc_splines_2

PCA

In this section, we learned how to use PCA, normalized data and reduce variable dimensions.

PCA brute force

In this exercise, we will use brute force to show all combination of Iris datasets.

$ python pca_brute_force.py

pca_brute_force_1

pca_brute_force_2

pca_brute_force_3

pca_brute_force_4

pca_brute_force_5

pca_brute_force_6

PCA 2D

In this exercise, we will use a basic linear example to see how to reduce a 2d representation to 1D representation.

$ python pca_2d.py

pca_2d

pca_2d_2

PCA and biplot method on IRIS dataset

In this exercise, we will use PCA and biplot methods to represent of one chart the IRIS dataset.

$ python pca_biplot_iris.py

pca_biplot_iris_1

pca_biplot_iris_2

PCA and data normalization

In this exercise, we will see that unnormalized data could alter PCA analysis.

$ python pca_normalized.py

pca_normalized_1

pca_normalized_2

Contribute

This project is distributed under the MIT licence.

To test the quality, run this commands :

$ pip install flake8 prospector
$ flake8
$ prospector -F -i dataset/

To fix a bug, open an issue in github and submit a pull request.