BIDData/BIDMat

Creating a Matrix with more than 2,147,483,647 elements

Opened this issue · 2 comments

Hi,
The matrix dimensions in BidMat use the Int type, which seems to be limiting the number of elements I can fit in a matrix. After 2147483647 elements there are calculations in the Matrix class about the total number of elements that overflow and return a java.lang.NegativeArraySizeException.

As a background, my dataset is at most 15 billion numbers, as I am doing a PCA of ~ 1 million rows x 15,000 columns.. My use case is comparing the speed of this implementation to randomized SVD in BidMach/Mat and randomized SVD in numpy.
With 1.5 billion elements BidMach performed excellently (10s for the BidMach SVD with dim=50 on a beefy machine) but I cannot go further with it because I can't fit any more values in my matrices.

Cheers,
Amedee

Hi, the 2B limit for basic matrices is pretty fundamental. Its not really a BIDMat decision - since we build on Java, we inherit Java's limitation for array size to 2B elements.

But, you can use TMats. TMat is a new matrix type in BIDMat that contains tiles. Each tile will be limited to 2B elements, but your overall matrix can be larger. TMat's support many of the standard array ops, so you may not to do anything to use them in your code. If something you need is not implemented, let us know.

@jcanny So what would be the best way to convert the matrix (or load it from a file) using TMat? Is there any automated process of doing that, a function that would divide the huge matrix automatically into smaller ones and put them into TMat? I looked into it, but there seem to be no documentation for this matrix type and it looks like one has to specify all of the parameters (number of matrices, sizes etc) manually to create TMat.