This repository holds the code for the forthcomming book "Introduction to Machine Learning with Python" by Andreas Mueller and Sarah Guido. You can find details about the book on the O'Reilly website.
The books requires the current development version of scikit-learn, that is
0.18-dev. Most of the book can also be used with previous versions of
scikit-learn, though you need to adjust the import for everything from the
model_selection
module, mostly cross_val_score
, train_test_split
and GridSearchCV
.
This repository provides the notebooks from which the book is created, together
with the mglearn
library of helper functions to create figures and
datasets.
For the curious ones, the cover depicts a hellbender
To run the code, you need the packages numpy
, scipy
, scikit-learn
, matplotlib
, pandas
and pillow
.
Some of the visualizations of decision trees and neural networks structures also require graphviz
.
The easiest way to set up an environment is by installing Anaconda.
If you already have a Python environment set up, and you are using the conda
package manager, you can get all packages by running
conda install numpy scipy scikit-learn matplotlib pandas pillow graphviz
and then also
pip install graphviz
(Explanation: the conda package graphiz is the C library, not the python library)
If you already have a Python environment and are using pip to install packages, you need to run
pip install numpy scipy scikit-learn matplotlib pandas pillow graphviz
You also need to install the graphiz C-library, which is easiest using a package manager.
If you are using OS X and homebrew, you can brew install graphviz
. If you are on Ubuntu or debian, you can apt-get install graphviz
.
Installing graphviz on Windows can be tricky and using conda / anaconda is recommended.