macs3-project/MACS

Feat: Reduction of memory consumption

diego-rt opened this issue · 3 comments

Hello,

I'm working with a single cell ATAC dataset in an organism with a very large genome (32 Gbp). Therefore, the number of reads is quite high ~300M and I have a lot of candidate peaks:

INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #  The Hidden Markov Model for signals of binsize of 10 basepairs: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   open state index: state0 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   nucleosomal state index: state2 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   background state index: state1 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   Starting probabilities of states: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                          open         bg        nuc 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                         0.182     0.4838     0.3342 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   HMM Transition probabilities: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                          open         bg        nuc 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #             open->      0.984   0.008203   0.007821 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #               bg->   0.004914     0.9406    0.05444 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #              nuc->   0.005975    0.06062     0.9334 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   HMM Emissions (mean):  
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                         short       mono         di        tri 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #             open:      0.4205      1.458      1.184     0.5775 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #               bg:    0.006671     0.8911   0.003568    0.00196 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #              nuc:      0.8497      1.735     0.2407   0.009357 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #5 Decode with Viterbi to predict states 
INFO  @ 25 Feb 2024 02:45:29: [35845 MB] #5  Total candidate peaks : 1486723 

Unfortunately this means that the memory consumption is quite high and based on the current trend I guess I would need more than 1 Tb of memory to process using HMMRATAC:

INFO  @ 25 Feb 2024 19:33:03: [346296 MB] #    decoding 343000... 
INFO  @ 25 Feb 2024 19:39:18: [346623 MB] #    decoding 344000... 
INFO  @ 25 Feb 2024 19:45:37: [347801 MB] #    decoding 345000... 
INFO  @ 25 Feb 2024 19:53:59: [350527 MB] #    decoding 346000... 
INFO  @ 25 Feb 2024 20:00:37: [352610 MB] #    decoding 347000... 

Would be great if there was a way to decrease the memory consumption of this process.

Thanks a lot!

@diego-rt Thanks for the request! Yes. Optimizing memory usage for the decoding process is in our plan. Could you share with us the entire log -- even if it can't finish.

Great to hear and thanks a lot for the quick reply! Here goes the full log.

Thank you!

HMMRATAC.log

Thank you!