YingfanWang/PaCMAP

Multiprocessor support

Closed this issue · 5 comments

Hi Team,
Thank you for creating the amazing package and it works extremely well for my use case. But it happened to work very slow while performing DR from 128D -> 2D.

I am trying to reduce 60k rows of embeddings and it takes 1hr of time, is there any faster way or a parallel way to do it?

Thank

The code is already equipped with multiprocessor support, and I would say it's not normal for PaCMAP to run for such a long time in your case. Reducing MNIST(70k rows, 784d) only takes me ~1min on my Macbook. How did you launch the PaCMAP instance? Are you running multiple other jobs on parallel?

Hi @hyhuang00 , Thank you for your response. This is how I am using Pacmap.

    reducer  = pacmap.PaCMAP(n_dims=2, n_neighbors=n_neighbors, MN_ratio=MN_ratio, FP_ratio=FP_ratio,lr=0.005,num_iters=450) 
    reduced_embeddings = reducer.fit_transform(embeddings, init="pca")

Also, is there any param that i am missing to make it work on multicore?

These parameters look fine to me, but I can't replicate the reported problem on my machines. Perhaps you can try to upgrade your local libraries, especially numba, and check if it works again?

I think the delay is due to the params MN_ratio= 30,FP_ratio=100.0 which take lots of time. Apart from that, I can see now the process is running on multicores.

Oh, I see. I don't think you will need that much mid near pairs and fp pairs -- in our experiment, the default ratio works quite well. I see no reason to choose such large ratio values for these two parameters. Choosing such values will greatly increase the time needed for each step, which leads to the behavior you observed.