FAMD explained inertia == PCA explained inertia
orbisvicis opened this issue · 1 comments
orbisvicis commented
FAMD explained inertia is almost identical to PCA explained variance with one-hot encoding.
Given a mixed categorical/numeric dataframe where categorical data is encoded as str
/object
and numeric as int64
:
- FAMD total inertia is only 1 when the number of components equals the number of columns after one-hot encoding.
- The explained inertia from
prince.FAMD
nearly matches the explained variance fromsklearn.decomposition.PCA
. prince.FAMD.column_correlations
shows the one-hot encoded columns.
This can't be so? I'd compare the eigenvectors but I don't think those are available in prince... so I can't think of any reason to use FAMD.
orbisvicis commented
The results from prince.FAMD
match those from R's FactoMineR
. Does FAMD approach PCA as the number of samples increases, or do I just have one of those (difficult) datasets? The data has a large sample size and is mostly categorical with many categories and no categorical outliers.