RasmussenLab/phamb

Running vamb/phamb using only Vibrant contigs

Closed this issue · 3 comments

Hi,
I was wondering if it could be appropriate in your opinion to assemble reads into contigs, get the putative viral ones with Vibrant (or another equivalent software), concatenate them and then run vamb and then phamb?
Best
Greg

Hi Greg

My only worry with this approach is that you might miss out on some contigs (especially smaller ones < 5000 bp) that might belong to a virus and contain a limited number of viral genes, thus they will not be picked up by VIBRANT in the process and be discarded.
Nevertheless, your approach is totally feasible and should work fine. In the paper we did benchmarks on contig-subsets derived from viruses only (this will sort of correspond to your contig-input to vamb) and it really put the binning on "steroids".

Best,
Joachim

Hi Joachim
Thanks a lot for the quick response, FYI the default for Vibrant to consider contigs is 1kb and 4 ORFs.
Best
Greg

Hi Greg

OK! Maybe it's not a big issue with the smaller contigs then. We did not exactly optimise the hyperparamaters of the vamb binner for pure viral-contigs subset, instead it's optimised to handle the simultaneous presence of bacterial or contigs from other entities, but so far in our tests it has performed pretty well for pure virus subsets.
Remember to do binsplits of your VAMB clusters by the sample identifier in your contigs (also advised in the VAMB repo).

Wishing you all the best with your viral metagenomic research! :-)

Best,
Joachim