kemerelab/ghostipy

Request for chunked data read

Opened this issue · 0 comments

In the current version of the package, out of ram filtering on large hdf5 files is very slow, likely because the h5py library is not good at loading small data chunks.

It would be very helpful if there were an optional parameter in the filtering routines that allowed the user to specify a chunksize (presumably a number of samples to load or an amount of RAM to use) such that data would be preloaded in chunks of that size and then filtered.

In our tests, preloading 130 GB of data stored in an h5py file resulted in a >10 fold speedup of filtering.