Prince: A Python repository from jphcoi

Prince is a factor analysis library for datasets that fit in memory.

Quick start

Prince uses pandas to manipulate dataframes, as such it expects an initial dataframe to work with. In the following example, a Principal Component Analysis (PCA) is applied to the iris dataset. Under the hood Prince decomposes the dataframe into two eigenvector matrices and one eigenvalue array thanks to a Singular Value Decomposition (SVD). The eigenvectors can then be used to project the initial dataset onto lower dimensions.

import matplotlib.pyplot as plt
import pandas as pd

import prince


df = pd.read_csv('data/iris.csv')

pca = prince.PCA(df, n_components=4)

fig1, ax1 = pca.plot_cumulative_inertia()
fig2, ax2 = pca.plot_rows(color_by='class', ellipse_fill=True)

plt.show()

The first plot displays the rows in the initial dataset projected on to the two first right eigenvectors (the obtained projections are called principal coordinates). The ellipses are 90% confidence intervals.

The second plot displays the cumulative contributions of each eigenvector (by looking at the corresponding eigenvalues). In this case the total contribution is above 95% while only considering the two first eigenvectors.

Installation

Although it isn't a requirement, using Anaconda is a good idea in general for doing data science in Python.

Via PyPI

$ pip install prince

Via GitHub for the latest development version

$ pip install git+https://github.com/MaxHalford/Prince

Prince has the following dependencies:

pandas for manipulating dataframes
matplotlib as a default plotting backend
fbpca, Facebook's randomized SVD implementation

Documentation

Please check out the documentation for a list of available methods.

License

jphcoi/Prince

Quick start

Installation

Documentation

License