amplab/snap

building index

biolxy opened this issue · 6 comments

How to build an index for a 130G nt.fa file on a Linux with only 300G of RAM?Thanks

You can try the -sm (small memory) option to index build. It may be that you don’t have enough RAM though. From: biolxy @.> Sent: Wednesday, December 15, 2021 2:37 AM To: amplab/snap @.> Cc: Subscribed @.***> Subject: [amplab/snap] building index (Issue #140) How to build an index for a 130G nt.fa file on a Linux with only 300G of RAM?Thanks - You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F140&data=04%7C01%7Cbolosky%40microsoft.com%7Ca4220f9322c341efb0e508d9bfa600b4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637751541960115615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GcxZ7MECEQejdpxhD9gxw4dCl%2Fse44JXWeWxvf81W%2F8%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWPCOYI72FIWSDGNFLDURBHRFANCNFSM5KDAFT5A&data=04%7C01%7Cbolosky%40microsoft.com%7Ca4220f9322c341efb0e508d9bfa600b4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637751541960115615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3ZVxMBHicH%2BJ4UdcKFR0t18PrAyIkTBrto2kHdm68z8%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cbolosky%40microsoft.com%7Ca4220f9322c341efb0e508d9bfa600b4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637751541960115615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=qslmyoufcT5imCyhExOp3g%2FhOLxYlvBi3aF5cbNFhmo%3D&reserved=0 or Androidhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cbolosky%40microsoft.com%7Ca4220f9322c341efb0e508d9bfa600b4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637751541960115615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Q7fm%2BHltYdSaXZTYtehg%2F%2BaABuWTxxH4mpYddnDoFkM%3D&reserved=0.

Thanks for your reply. Yes, I don't have enough memory and even with the -sm parameter, I still can't successfully create index

You can try Bill’s method - but as he mentioned, I suspect you will not have enough RAM to make a single nt index, nor perform an alignment to all of nt with 300GB of RAM. Alternatively, you can split the nt FASTA into several chunks (you’ll need to play with this on your system to figure out the optimal number), and make a separate index for each chunk. When you perform the alignment, you’ll then want to have some code to collate the alignments from the individual index chunks and come up with hits across all of the chunks. As for splitting the nt FASTA into chunks, this can be done taxonomically (e.g. virus, bacteria, etc…), or randomly depending on what makes the most sense for your project.

Thank you for your reply.
I think your suggestion is feasible. I have a question, (SNAPs default is to produce a single best alignment for each read that its maps) if I split the nt FASTA, and make a separate index for each chunk, then Is it possible that the same reads get the best alignment in each index lib, so which one is the most reliable for these alignment results?

Thank you for your help. I'll try it @bolosky