Using BinSanity for metagenomes with bacteria, archaea, and protists
Closed this issue · 1 comments
I want to try BinSanity on my dataset and I know there are at least Cyanobacteria and Diatoms. I suspect there are also archaea in this ocean dataset. Can I use BinSanity-wf and skip the CheckM stage or do I have to run BinSanity and BinSanity-refine separately?
Also, the default values in BinSanity-refine are different than the actual values. 0.9 vs. 0.95 for dampening factor I believe. Which one would be a better option for a dataset if around 800k contigs from a single sample (not a coassaembly) ?
Hello,
The default for dampening should be 0.95, I'll change that in the help menu.
You cannot skip the CheckM step of Binsanity-wf or Binsanity-lc, those are a key part of evaluating which genomes to refine and which ones not to. You'd have to use both Binsanity-refine and Binsanity seperately outside of the workflow. The way Binsanity-wf and Binsanity-lc are designed to work is by initially clustering contigs using only coverage, assessing the resultant bins using CheckM and classifying them into categories of "High Completion", "Low Completion", and "High Redundancy". Only the High Redundancy and Low Completion bins are used in subsequent refinement steps that take into account composition in additiont to coverage. If you wanted to use another way to evaluate your genomes you would need to run Binsanity, then take resulting bins, assess them, and use Binsanity-refine on the highly redundant bins.
If you have a dataset of 800,000 plus with just a single sample that gets tricky. Binsanity works best with multiple samples because initial clustering relys soley on coverage metrics and it uses compositional features to refine the initial bins produced. First with 800k contigs is that greater than 1kb? If so you'll probably have to use 'Binsanity-lc' because of the sheer number of contigs or you may run into memory issues. I'd also say with a single sample following a run through Binsanity-lc or Binsanity-refine you may want to use something like Anvi'o (http://merenlab.org/software/anvio/) to manually refine the bins output.
Hopefully thats helpful! We are currently working on 'Binsanity2' which should hopefully improve on some of the current issues with the distribution.
-Elaina