`--dense` does not work
grst opened this issue · 10 comments
Thanks for adding the --dense
option.
However, at the moment there still seems to be a problem.
On a dense matrix with ~70k cells and 50 dimensions, it runs for more than 30min with 32 cores and takes a vast amount of memory (~120GB).
On the other hand, without the dense
option it completes within a couple of minutes.
Given that it is already pretty fast without dense
, I'm not even sure if it's worth supporting this option.
Cheers,
Gregor
Does it complete or hang?
it does complete on the test matrix (2000 cells) as fast as without dense
.
On the 70k cell matrix I aborted after a bit more than 30min.
Hmm, this might be worth looking into rather than removing an option that may be valuable in the future.
Is it possible to give me sample data that causes this issue?
I don't want to share the data publicly, but I can make it available to you via email.
Here are the two full commands:
without dense
sudo time docker run -it -v /storage/scratch/toomanycells:/scratch:Z \
gregoryschwartz/too-many-cells:0.1.1.0 make-tree -m /scratch/harmony.csv \
--no-filter --normalization NoneNorm -o /scratch/out +RTS -N8
10:03.04 elapsed, memory usage ~2GB
with dense
It now even fails with some memory problem on a ~ 51x70000 matrix.
sudo time docker run -it -v /storage/scratch/toomanycells:/scratch:Z \
gregoryschwartz/too-many-cells:0.1.1.0 make-tree -m /scratch/harmony.csv \
--no-filter --normalization NoneNorm --dense -o /scratch/out +RTS -N8
Sketching tree [=================================>..................................................] 40%
too-many-cells: internal error: Unable to commit 1353711616 bytes of memory
(GHC version 8.4.3 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Would you happen to have a smaller example? Reproducing the error with the smallest example possible is the best way for me to find the issue.
Only appears to happen with a certain size.
Here's a subset of 20,000 cells which I think I can share here.
test_matrix.csv.gz
It completes with --dense
, but without it takes only about half of the runtime and and 10% of the memory.
It's spending most of the time making the dense diagonal matrix which would become 70,000 squared in the crashing example, taking up a lot of space. I assume it works fine for the sparse matrix as the diagonal matrix is mostly 0s. I can probably find a way around this, but for now don't use dense for large numbers of cells.
As I said, I'm find with using the sparse mode with the low-dimensional dense matrices as it works and is reasonably fast.
Feel free to close the issue.
I think I fixed the issue in GregorySchwartz/spectral-clustering@aa07ea5, so now it should take up less space. Featured in 0.1.3.1 which is also on docker. Will close for now, if the problem persists, please let me know!