/skmca

A scikit-learn compatible implementation of MCA

Primary LanguagePythonMIT LicenseMIT

Use https://github.com/MaxHalford/Prince instead

skmca

A scikit-learn pipeline API compatible implementation of Multiple Correspondence Analysis (MCA).

Usage

import pandas as pd
from skmca import MCA

df = pd.read_csv('http://www.statoek.wiso.uni-goettingen.de/'
                 'CARME-N/download/wg93.txt',
                 sep='\t', dtype='category')
mca = MCA()
mca.fit(df)

Crucially, the input to MCA.fit must be a pandas.DataFrame where all the columns have a category dtype. This is necessary to ensure that the dummy encoding of the columns is consistent across training and test datasets.

Background

MCA is like `PCA`_, but for categorical data. You can use it to visualize high-dimensional datasets. It can also be useful as a pre-processing step for clustering, to avoid the curse of dimensionality.

skmca requires pandas and scikit-learn.

References

This library follows the setup in `Nenadic and Greenacre (2005)`_.