lvdmaaten/bhtsne

Sparse Input Data

3bst0r opened this issue · 2 comments

Is there a way to input sparse data? I suspect this is not a straight-forward thing to do, because of the lack of a standard way to store sparse matrices in a text file, i.e. python probably does it different than matlab (did not check though).


OT: I just watched a video of you presenting t-SNE at Google and I want to compliment you on your explanation skills. Very clear and understandable.

This is not implemented right now. Both Matlab and Numpy support compressed sparse row matrices, so it would be possible to add this.

For now, a potential way to circumvent this would be to do some kind of (logistic) PCA preprocessing in Matlab / Python and use the reduced data as input into t-SNE. (This is assuming that the full matrix is too big to keep in memory.)

If I´m not mistaken, Truncated SVD is also good for preprocessing sparse data and it can work with scipy.sparse.

http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

(I am brand new to the field, this is my first ever comment on github - breaking the ice here ;)).