YingfanWang/PaCMAP

Error with only a few data points

zeitderforschung opened this issue · 2 comments

This one works (100 data points):

import numpy as np
from pacmap import pacmap

X = np.random.rand(100, 50)
pacmap.PaCMAP().fit_transform(X)

This one fails (10 data points):

import numpy as np
from pacmap import pacmap

X = np.random.rand(10, 50)
pacmap.PaCMAP().fit_transform(X)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-dc1438c486bd> in <module>
      3 
      4 X = np.random.rand(10, 50)
----> 5 pacmap.PaCMAP().fit_transform(X)

/usr/local/anaconda3/lib/python3.8/site-packages/pacmap/pacmap.py in fit_transform(self, X, init, save_pairs)
    502 
    503     def fit_transform(self, X, init="random", save_pairs=True):
--> 504         self.fit(X, init, save_pairs)
    505         if self.intermediate:
    506             return self.intermediate_states

/usr/local/anaconda3/lib/python3.8/site-packages/pacmap/pacmap.py in fit(self, X, init, save_pairs)
    463             )
    464         if save_pairs:
--> 465             self.embedding_, self.intermediate_states, self.pair_neighbors, self.pair_MN, self.pair_FP = pacmap(
    466                 X,
    467                 self.n_dims,

/usr/local/anaconda3/lib/python3.8/site-packages/pacmap/pacmap.py in pacmap(X, n_dims, n_neighbors, n_MN, n_FP, pair_neighbors, pair_MN, pair_FP, distance, lr, num_iters, Yinit, apply_pca, verbose, intermediate)
    308                 if verbose:
    309                     print(X)
--> 310         pair_neighbors, pair_MN, pair_FP = generate_pair(
    311             X, n_neighbors, n_MN, n_FP, distance, verbose
    312         )

/usr/local/anaconda3/lib/python3.8/site-packages/pacmap/pacmap.py in generate_pair(X, n_neighbors, n_MN, n_FP, distance, verbose)
    234     for i in range(n):
    235         nbrs_ = tree.get_nns_by_item(i, n_neighbors_extra+1)
--> 236         nbrs[i, :] = nbrs_[1:]
    237         for j in range(n_neighbors_extra):
    238             knn_distances[i, j] = tree.get_distance(i, nbrs[i, j])

ValueError: cannot copy sequence with size 9 to array axis with dimension 10

The current PaCMAP implementation made an assumption that the number of samples in the dataset should be greater than (n_neighbors+50), which leads to the error you reported. We are going to fix it in a later version (see #4 for a more detailed discussion).

This problem has been fixed in release 0.5.5.