MaxHalford/prince

Feature Idea: Weights

cdbcheng opened this issue · 3 comments

Have you considered adding support for sampling weights to the package? This would help when dealing with weighted survey samples, especially with the MCA (since most surveys consist of multiple choice questions).

Thanks!

Hey there. Yes, this is very relevant. It's a reasonably big undertaking though. It took me time to figure out and test the current non-weighted implementations. But I'm sure it's doable. One would have to start with PCA, then CA, then MCA.

The following might (or might not) be helpful to start:

Mathematical notation and some examples of how to implement weighted and eigenvalue PCA, relying only on numpy and scikit-learn. However, there is nothing for CA and MCA, and I believe (but don't quote me on this) that it is possible to conduct WPCA without repeating rows, since it is possible to calculate weighted variance without repeating rows.
https://github.com/nogilnick/WeightedPCA

Let me know if there's anything else that could help!

For sure I believe we want an implementation which does not require duplicating rows. That would be neither practical or elegant.