marbl/parsnp

Parsnp memory issue

bkille opened this issue · 1 comments

Dear bkille,
Thank you very much for the software.
I am trying to parsnp ~ 1700 Klebsiella pneumoniae genomes on a server cluster (ten nodes, 16 threads and 126G RAM per node), and parsnp 1.2 always come up with #81 this error (no difficulties in genomes alignment of a small sample sizes, up to 500 genomes).
When runing on ~ 1700 genomes, I found the RAM increased gradually to 126GB and the job was then killed (out of memory).
Thus I added -P 35000000 (or -P 35000, or -P 90000, or -P 90, -P 35, −P 35GB), while each won’t stop the memory usage increase up to 126GB, which resulted the job killed and my #81 error.

So, i am appreciate if you could tell:

  • what’s the unit of -P (default= 15000000), why my -P won’t stop the increase of memory usage?
  • Any suggestions on running parsnp on ~ 1700 genomes based on my current Sun Grid Engine cluster.
  • And what about running parsnp on ~ 8000 genomes, any suggestions?
    BTW, this error has ever been mentioned at [https://github.com/marbl/harvest/issues/22]

Thanks in advance and have a nice weekend.

/Sun

Originally posted by @sunctx in #81 (comment)

@sunctx I used the Klebsiella genomes referenced in marbl/harvest#22. There are roughly ~1300 of them. I also used the ANI recruitment strategy, since the default recruitment includes genomes that don't align well at all.

/usr/bin/time -v parsnp -d ~/Data/klebsiella/*.fna -r ~/Data/klebsiella/GCF_900093815.1_18090_8_78_genomic.fna -p 30 --use-ani --min-ani 98 -P 500000

The command finished and used ~28GB of RAM at peak usage (see time output below). With default partition size -P 1500000, the peak ram was much higher (around 90GB) but still under 128GB. With ~1700 genomes, I could see how you may run out of RAM with the default partition size. From my understanding of the source code, the partition size refers to the individual chunks that parsnp works with at one time (I believe it is in base pairs) but I can confirm with the code's author.

With your set of ~1700, decreasing the partition size should help. I'd be happy to help further as well. If you run your code with the --verbose flag and attach the output, I should be able to see where the binary runs out of memory.

User time (seconds): 167027.94                                                                                                                                                                                                                                            
        System time (seconds): 1039.54                                                                                                                                                                                                                                            
        Percent of CPU this job got: 429%                                                                                                                                                                                                                                         
        Elapsed (wall clock) time (h:mm:ss or m:ss): 10:52:35                                                                                                                                                                                                                     
        Average shared text size (kbytes): 0                                                                                                                                                                                                                                      
        Average unshared data size (kbytes): 0                                                                                                                                                                                                                                    
        Average stack size (kbytes): 0                                                                                                                                                                                                                                            
        Average total size (kbytes): 0                                                                                                                                                                                                                                            
        Maximum resident set size (kbytes): 28718316                                                                                                                                                                                                                              
        Average resident set size (kbytes): 0                                                                                                                                                                                                                                     
        Major (requiring I/O) page faults: 131                                                                                                                                                                                                                                    
        Minor (reclaiming a frame) page faults: 142279582                                                                                                                                                                                                                         
        Voluntary context switches: 783022                                                                                                                                                                                                                                        
        Involuntary context switches: 1530536                                                                                                                                                                                                                                     
        Swaps: 0                                                                                                                                                                                                                                                                  
        File system inputs: 14283680                                                                                                                                                                                                                                              
        File system outputs: 1700256                                                                                                                                                                                                                                              
        Socket messages sent: 0                                                                                                                                                                                                                                                   
        Socket messages received: 0                                                                                                                                                                                                                                               
        Signals delivered: 0                                                                                                                                                                                                                                                      
        Page size (bytes): 4096                                                                                                                                                                                                                                                   
        Exit status: 0