Model fitting fails on large datasets

Question

Model fitting fails on large datasets

Closed this issue 6 years ago · 5 comments

Thank you for developing this package and publishing it in this easily-installable and open way!
I was trying out the method on our lightsheet data, and the results on small patches look promising. However, if I try to apply it to a bigger chunk, TensorFlow complains:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.
I already have algorithms in place to stitch spatial filers extracted from overlapping patches from another method (CNMF), however due to the large signal contamination from out-of-focus planes in the lightsheet, taking into account bigger areas with more planes would yield better results, if the computational costs do not become prohibitively large.

Answer 1 · 2018-04-10T19:30:01.000Z

Thank you for trying out our framework.

Currently a single filter bank cannot be larger than 2GB. Could that be a problem in your case? Could you please let me know the shape of the input data (X) and the number of sources (K)?

Answer 2 · 2018-04-11T08:58:34.000Z

This is quite likely to be true, the input data is 1275 frames, 1266720 voxels and I have set the number of sources to 1000, which I guess all together was to ambitious to try at once. Memory-wise the computer should be able to handle it though (there is 128 GB of RAM), I guess this is a tensorflow-related limit?
Does the run-time of the algorithm scale linearly with the dimensions of the tensors?

Answer 3 · 2018-04-11T09:12:20.000Z

@aboettcher : apparently this is a well known limit of tensorflow that can be circumvented by first creating a placeholder and then feeding the array during initialisation time (see https://stackoverflow.com/questions/35394103/initializing-tensorflow-variable-with-an-array-larger-than-2gb). Would that solution work for us as well?

Answer 4 · 2018-04-11T10:53:39.000Z

Given the dimensions provided by @vilim the problem should be the size of the filter banks. They are currently initialized using tensorflow.constant with a (random) numpy array as initialization. That technique unnecessarily adds large constants to the graph. The approach pointed out by @wielandbrendel should avoid that problem. I will update the code accordingly.

Answer 5 · 2018-04-17T09:01:53.000Z

Commits 0ea7801 and 5ad43fd should fix that issue.