elbamos/largeVis

issue on dealing with sparseMatrix?

Closed this issue · 5 comments

Hi Amos,

Thanks for this nice package for largeVis integration. I tried to test whether we can run sparsematrix with this package. However, I found that the following issue:

library(largeVis)
library(Matrix)
largeDataset <- spMatrix(1000,1000, i=sample(1:1000, 1000), j=sample(1:1000, 1000), x=sample(1:1000, 1000))
neighbors <- randomProjectionTreeSearch(log(largeDataset + 1), K = 10)
Error in UseMethod("randomProjectionTreeSearch") : 
  no applicable method for 'randomProjectionTreeSearch' applied to an object of class "c('dgeMatrix', 'ddenseMatrix', 'generalMatrix', 'geMatrix', 'dMatrix', 'denseMatrix', 'compMatrix', 'Matrix', 'xMatrix', 'mMatrix', 'Mnumeric', 'replValueSp')"

I tried either the newest CRAN version and the github version, both having the same issue above.
Any thoughts?

In addition, have you test your package on the 3 million word vectors from the GoogleNews dataset (or any other datasets with 1m data points and hundreds feature dimensions)? and how does this package perform in terms of time or memory requirement on it?

Thanks!

It appears to me that your sparse matrix isn't actually sparse, and may not be well-formed. It's being given a class of ddenseMatrix. Possibly Matrix is doing that because i and j aren't in order? I don't know.

If you want to get back to me, I intend to take a look at various things in the package this weekend. Its overdue for an update.

I haven't run largeVis on 3 million word vectors. I have it run on 800,000 items in the demo docs. That ran overnight on an AWS box, but that was two years ago now. I expect it would take a day or so on a modern machine to run on 3m vectors. The runtime will also be strongly correlated with the hyperparameters, in particular K, the number of trees, and the tree size. Memory usage will be strongly correlated with tree size and the number of trees.

Hi Amos,

Thanks for your reply, especially to your detailed description on the running and memory requirement of this method!
regarding to your comments of ddenseMatrix . that is interesting, I find the largeDataset object in my hand is a sparse matrix and its class attribute is dgTMatrix.

> largeDataset <- spMatrix(1000,1000, i=sample(1:1000, 1000), j=sample(1:1000, 1000), x=sample(1:1000, 1000))
> class(largeDataset)
[1] "dgTMatrix"
attr(,"package")
[1] "Matrix" 

Would you mind to offer me an example on running sparseMatrix with largeVis in your hand?

Thanks,
Xiaojie

Sure

> largeDataset <- spMatrix(1000,1000, i=sample(1:1000, 1000), j=sample(1:1000, 1000), x=sample(1:1000, 1000))
> ld2 <- as(ld2, "dgCMatrix")
> neighbors <- randomProjectionTreeSearch(ld2, K = 10)
> str(neighbors)
 num [1:10, 1:1000] 559 485 329 838 741 71 315 186 805 119 ...

So I guess you were making a triplet matrix. I only implemented functions for handling CSC sparse matrices. Let me know if there's a strong need for supporting triplets. One issue is that triplet matrices can contain duplicate entries...

I'm going to close this issue, please feel free to reopen.