MannLabs/alphapept

'align_datasets' very slow for large number of files

mschwoer opened this issue · 0 comments

Describe the bug
Performance for the 'align' step gets very poor for large data sets
(e.g. 500 runs, 11min gradients, plasma).

Proposed solution
The proposed solution (see related PR) uses caching of the (stripped) data frames together with numba for the calculations.
For the data set mentioned above, the caching-only solution yielded a speedup of 20x, the caching-plus-numba solution a speedup of 30x. Memory consumption did not increase noticeably when all data was cached.