YingfanWang/PaCMAP

[Bug] Problem with angular metric

Closed this issue · 3 comments

aluo-x commented

Given data X, with size [m, n], the fit_transform results differ depending on if the data is pre-normalized (so each row is unit-norm). However the expected result is that the angular metric should remain robust to the norm.

Thanks for reporting. Are you using the default initialization option (with PCA?) In that case, the initialization of the embedding will be based on the pre-normalized data, so it's expected that the result will be different. To disable this setting, you can explicitly set init=random in the fit_transform() method. For reproducibility, you may also want to explicitly set the random state. A snippet is provided below:

import pacmap
reducer = pacmap.PaCMAP(random_state=12345678, distance="angular")
X = your_data()
embedding = reducer.fit_transform(X, init="random")
aluo-x commented

Ah that makes sense, it seems like PCA should be changed to perform normalization as well.

I've "fixed" the bug by always pre-normalizing the data. I've noticed some other issues that seem to be bugs, will write a report at a later point in time once ICLR is over.

Happy to hear your problem is solved and you're using PaCMAP for your research :D Feel free to report any additional problems at any time, and I will try to respond ASAP.