MaxHalford/prince

How to transform on new unseen test data?

tricha-zemoso opened this issue · 3 comments

I am able to fit the MCA model to training data. I would like to use the model then to find the row coordinates of new unseen data. The transform function is not working for unseen data as I get a keyerror. Is it possible to "transform" new data using model fitted to training data, like the sklearn transformation functions?
Please help me to understand.

Hello. Can you provide a reproducible example?

data.csv
Python Code:

from prince import MCA
import pandas as pd

data = pd.read_csv("data.csv", index_col=[0])
mca = MCA(n_components=10, n_iter=3, copy=True, check_input=True, engine='sklearn',random_state=42)
mca = mca.fit(data[:3])
print(mca.eigenvalues_summary)
print(mca.row_coordinates(data[:3]))
print(mca.transform(data[3:]))

Error:
KeyError: "['category_Application Access', 'userid_a@b.com', 'userid_b@b.com', 'location_San Jose, CA, United States', 'applicationname_B', 'applicationname_C', 'browser_Other'] not in index"

Indeed, MCA does not yet work with rows/columns which have not been seen before. It's on my TODO, but I don't know when I'll tackle it.