/Prince

:crown: Python factor analysis library (PCA, CA, MCA, FAMD)

Primary LanguagePythonMIT LicenseMIT

prince_logo

Documentation Status PyPI version Build Status Coverage Status Codacy Badge Requirements Status


Prince is a factor analysis library for datasets that fit in memory.

Quick start

Prince uses pandas to manipulate dataframes, as such it expects an initial dataframe to work with. In the following example, a Principal Component Analysis (PCA) is applied to the iris dataset. Under the hood Prince decomposes the dataframe into two eigenvector matrices and one eigenvalue array thanks to a Singular Value Decomposition (SVD). The eigenvectors can then be used to project the initial dataset onto lower dimensions.

import matplotlib.pyplot as plt
import pandas as pd

import prince


df = pd.read_csv('data/iris.csv')

pca = prince.PCA(df, n_components=4)

fig1, ax1 = pca.plot_cumulative_inertia()
fig2, ax2 = pca.plot_rows(color_by='class', ellipse_fill=True)

plt.show()

The first plot displays the rows in the initial dataset projected on to the two first right eigenvectors (the obtained projections are called principal coordinates). The ellipses are 90% confidence intervals.

row_principal coordinates

The second plot displays the cumulative contributions of each eigenvector (by looking at the corresponding eigenvalues). In this case the total contribution is above 95% while only considering the two first eigenvectors.

cumulative_inertia

Installation

Although it isn't a requirement, using Anaconda is a good idea in general for doing data science in Python.

Via PyPI

$ pip install prince

Via GitHub for the latest development version

$ pip install git+https://github.com/MaxHalford/Prince

Prince has the following dependencies:

Documentation

Please check out the documentation for a list of available methods.

License

mit