/penguin-models

A small Python project which explores using scikit-learn to classify Palmer penguins by species.

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

penguin-models

This is a small Python project which explores using scikit-learn to classify penguins by species in the Palmer penguins dataset given their bill features. This was a personal project that I used to learn about support vector machines in scikit-learn.

Setup

To run the code you will need a Python 3 installation with the packages listed in environment.yml. To create an environment with these packages using the Anaconda distribution, run the following conda command in the repo directory:

conda env create -f environment.yml

This will create an environment called penguin-models. You can activate the environment with:

conda activate penguin-models

And deactivate it with:

conda deactivate

See the conda documentation for further information on environments.

Analysis

To run the analysis, start an IPython shell:

ipython

Then import the analysis module and call its run method:

import analysis
analysis.run()

This will load the data, train the models, and create the plots in the plots directory. There is an index.html file in the plots directory that shows all of the plots in an annotated webpage.

Style

The plots use a custom matplotlib theme called eda. In the plots module this is loaded from the file style/eda.mplstyle.

If you want to use this style in other projects, you can copy the file into your matplotlib style library, which is normally located at ~/.matplotlib/stylelib. You can then load it with:

import matplotlib.pyplot as plt
plt.style.use(['eda'])

Further reading

I've been learning how to use scikit-learn with Aurelien Geron's book Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. It's really good. This project was produced by applying what Geron teaches in his book to a novel dataset.