lvdmaaten/bhtsne

Using bhtsne.py with a numpy array

Closed this issue · 2 comments

Hello,

I have found this implementation as sklearn's TSNE doesn't scale well with my 50k x 50k similarity matrix. Is there a simple way to pass this matrix the same way it is passed in scikit-learn. Thanks.

Why can't you use the original data as input? If that doesn't work, you could perform an eigenanalysis of the top left singular vectors of the centered distance matrix and use that as input.

In general, however, your approach is not going to scale at all because the size of your input scales quadratically. This is why it is not supported by the code (and why I am not planning to support it).

I am not sure if it is a good idea to pass a 50k x 50k matrix as text input. Could you show an example about the eiegenanalysis? Also is there a way to calculate memory estimation for TSNE in general? How about if I don't require high precision, use float16's instead?