MaxHalford/prince

FAMD explained inertia == PCA explained inertia

orbisvicis opened this issue · 1 comments

FAMD explained inertia is almost identical to PCA explained variance with one-hot encoding.

Given a mixed categorical/numeric dataframe where categorical data is encoded as str/object and numeric as int64:

  • FAMD total inertia is only 1 when the number of components equals the number of columns after one-hot encoding.
  • The explained inertia from prince.FAMD nearly matches the explained variance from sklearn.decomposition.PCA.
  • prince.FAMD.column_correlations shows the one-hot encoded columns.

This can't be so? I'd compare the eigenvectors but I don't think those are available in prince... so I can't think of any reason to use FAMD.

The results from prince.FAMD match those from R's FactoMineR. Does FAMD approach PCA as the number of samples increases, or do I just have one of those (difficult) datasets? The data has a large sample size and is mostly categorical with many categories and no categorical outliers.