/yellowbrick

A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.

Primary LanguagePythonApache License 2.0Apache-2.0

Yellowbrick

Build Status Coverage Status Code Health Documentation Status Stories in Ready

Visual analysis and diagnostic tools to facilitate machine learning model selection.

Follow the yellow brick road Image by Quatro Cinco, used with permission, Flickr Creative Commons.

This README is a guide for developers, if you're new to Yellowbrick, get started at our documentation.

What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for your models!

Visualizers

Visualizers

Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the "ModelCV" (e.g. RidgeCV, LassoCV) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:

Feature Visualization

  • Rank2D: pairwise ranking of features to detect relationships
  • Parallel Coordinates: horizontal visualization of instances
  • Radial Visualization: separation of instances around a circular plot

Classification Visualization

  • Class Balance: see how the distribution of classes affects the model
  • Classification Report: visual representation of precision, recall, and F1
  • ROC/AUC Curves: receiver operator characteristics and area under the curve
  • Confusion Matrices: visual description of class decision making

Regression Visualization

  • Prediction Error Plots: find model breakdowns along the domain of the target
  • Residuals Plot: show the difference in residuals of training and test data
  • Alpha Selection: show how the choice of alpha influences regularization

Clustering Visualization

  • K-Elbow Plot: select k using the elbow method and various metrics
  • Silhouette Plot: select k by visualizing silhouette coefficient values

Text Visualization

  • Term Frequency: visualize the frequency distribution of terms in the corpus
  • TSNE: use stochastic neighbor embedding to project documents.

And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!

Installing Yellowbrick

Yellowbrick is compatible with Python 2.7 or later but it is preferred to use Python 3.5 or later to take full advantage of all functionality. Yellowbrick also depends on Scikit-Learn 0.18 or later and Matplotlib 1.5 or later. The simplest way to install Yellowbrick is from PyPI with pip, Python's preferred package installer.

$ pip install yellowbrick

Note that Yellowbrick is an active project and routinely publishes new releases with more visualizers and updates. In order to upgrade Yellowbrick to the latest version, use pip as follows.

$ pip install -u yellowbrick

You can also use the -u flag to update Scikit-Learn, matplotlib, or any other third party utilities that work well with Yellowbrick to their latest versions.

If you're using Windows or Anaconda, you can take advantage of the conda utility to install Yellowbrick:

conda install -c districtdatalabs yellowbrick

Note, however, that there is a known bug installing Yellowbrick on Linux with Anaconda.

Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:

Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.

from yellowbrick.features import Rank2D

visualizer = Rank2D(features=features, algorithm='covariance')
visualizer.fit(X, y)                # Fit the data to the visualizer
visualizer.transform(X)             # Transform the data
visualizer.poof()                   # Draw/show/poof the data

Model Visualization

In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC

model = LinearSVC()
model.fit(X,y)
visualizer = ROCAUC(model)
visualizer.score(X,y)
visualizer.poof()

For additional information on getting started with Yellowbrick, check out our examples notebook.

We also have a quick start guide.

Contributing to Yellowbrick

Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

Principally, Yellowbrick development is about the addition and creation of visualizers --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.

Beyond creating visualizers, there are many ways to contribute:

  • Submit a bug report or feature request on GitHub Issues.
  • Contribute a Jupyter notebook to our examples gallery.
  • Assist us with user testing.
  • Add to the documentation or help with our website, scikit-yb.org.
  • Write unit or integration tests for our project.
  • Answer questions on our issues, mailing list, Stack Overflow, and elsewhere.
  • Translate our documentation into another language.
  • Write a blog post, tweet, or share our project with others.
  • Teach someone how to use Yellowbrick.

As you can see, there are lots of ways to get involved and we would be very happy for you to join us! The only thing we ask is that you abide by the principles of openness, respect, and consideration of others as described in the Python Software Foundation Code of Conduct.

For more information, checkout CONTRIBUTING.md.