Feat: Reduction of memory consumption
diego-rt opened this issue · 3 comments
diego-rt commented
Hello,
I'm working with a single cell ATAC dataset in an organism with a very large genome (32 Gbp). Therefore, the number of reads is quite high ~300M and I have a lot of candidate peaks:
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # The Hidden Markov Model for signals of binsize of 10 basepairs:
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # open state index: state0
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # nucleosomal state index: state2
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # background state index: state1
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # Starting probabilities of states:
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # open bg nuc
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # 0.182 0.4838 0.3342
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # HMM Transition probabilities:
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # open bg nuc
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # open-> 0.984 0.008203 0.007821
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # bg-> 0.004914 0.9406 0.05444
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # nuc-> 0.005975 0.06062 0.9334
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # HMM Emissions (mean):
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # short mono di tri
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # open: 0.4205 1.458 1.184 0.5775
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # bg: 0.006671 0.8911 0.003568 0.00196
INFO @ 25 Feb 2024 02:43:51: [35845 MB] # nuc: 0.8497 1.735 0.2407 0.009357
INFO @ 25 Feb 2024 02:43:51: [35845 MB] #5 Decode with Viterbi to predict states
INFO @ 25 Feb 2024 02:45:29: [35845 MB] #5 Total candidate peaks : 1486723
Unfortunately this means that the memory consumption is quite high and based on the current trend I guess I would need more than 1 Tb of memory to process using HMMRATAC:
INFO @ 25 Feb 2024 19:33:03: [346296 MB] # decoding 343000...
INFO @ 25 Feb 2024 19:39:18: [346623 MB] # decoding 344000...
INFO @ 25 Feb 2024 19:45:37: [347801 MB] # decoding 345000...
INFO @ 25 Feb 2024 19:53:59: [350527 MB] # decoding 346000...
INFO @ 25 Feb 2024 20:00:37: [352610 MB] # decoding 347000...
Would be great if there was a way to decrease the memory consumption of this process.
Thanks a lot!
taoliu commented
@diego-rt Thanks for the request! Yes. Optimizing memory usage for the decoding process is in our plan. Could you share with us the entire log -- even if it can't finish.
diego-rt commented
taoliu commented
Thank you!