YingfanWang/PaCMAP

Error setting random_state in PaCMAP

ian425 opened this issue · 6 comments

Hi,
I have being exploring PaCMAP this past week and wanted to test it against multiple attributes. One of them is repeatability and to do that I added a random_state = 1,10,20. I used FMNIST for the dataset and the code is the following:

import numpy as np
import pacmap
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import time
import seaborn as sns
import pandas as pd
import umap.plot
import sys
import umap
from io import BytesIO
from PIL import Image
import base64
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import HoverTool, ColumnDataSource, CategoricalColorMapper
from bokeh.palettes import Spectral10, Category10


train = np.load("/home/icalle/Documents/umap/Data/fmnist_images.npy", allow_pickle=True)
train = train.reshape(train.shape[0], -1)
test = np.load("/home/icalle/Documents/umap/Data/fmnist_labels.npy", allow_pickle=True)


reducer = pacmap.PaCMAP(n_dims=2, n_neighbors=10, MN_ratio=0.5, FP_ratio=2.0, random_state=20)
embedding = reducer.fit_transform(train, init="pca")

plt.scatter(reducer.embedding_[:, 0], reducer.embedding_[:, 1], s= 5, c=test, cmap='Spectral')
plt.gca().set_aspect('equal', 'datalim')
cbar = plt.colorbar(boundaries=np.arange(11)-0.5)
cbar.set_ticks([0,1,2,3,4,5,6,7,8,9])
cbar.set_ticklabels(["T-shirt/top","Trouser","Pullover","Dress","Coat", "Sandal","Shirt","Sneaker","Bag","Ankle boot"])
plt.title('PaCMAP Fashion-MNIST; n_neighbors=10, random_state= 20', fontsize=12);
plt.show()

After plotting, the results are the following:
PaCMAP_FMNIST

To conclude, as seen in the plot, this is not the correct output and I wanted to know where I went wrong in the code or if this issue has been raised before. I also want to mention, I tested it on the Digits dataset, on multiple random_state values, and with init="pca", but still the same result.

Thank you for reporting this issue. As far as I know, this issue has not been raised before. For the fmnist dataset, can I assume the two .npy files you used are the ones we provide in this repo? I will try to replicate the issue on my end.

Thank you for the prompt response. Yes, they were taken from this repo.

I have successfully replicated the issue locally. It seems like there's some problem with respect to mid-near pair sampling procedure when the random seed is present. I will try to publish a hotfix as soon as possible.

Great, thank you. Look forward to your solution.

PaCMAP v0.5.3 is now available on pypi. Let me know if that solves your problem.

Yes, it did. Thank you for your help.