Multiple cctyper runs on the same contig giving different outputs?

Question

Multiple cctyper runs on the same contig giving different outputs?

Closed this issue 2 years ago · 2 comments

Dear all

I ran cctyper on a large number of contigs, and the outputs of cas_operon.tab contained say a subgroup of contigs that had Cas hits.

I then sub-selected this set of Cas-positive contigs, and re-run cctyper. This time, A large majority of the contig and their Cas operon-containing positions were identical, except for a small subset of them, where some of these operons are now chopped up into multiple smaller cas operons. Interestingly, this small group of contigs are now present in the cas_operon_putative.tab, meaning that these predictions have become less confident.

I wonder why this is the case even though the contigs selected from the two cctyper runs were a subset of that of the first, but otherwise identical contigs. Thanks

Marcus

Answer 1 · 2022-03-16T10:19:25.000Z

Dear Marcus

I think the problem is with the open-reading-frame tool, prodigal, that cctyper uses. If you run cctyper with default settings it is expecting a single genome as input and prodigal will optimize it's gene-finding algorithm to that genome. If you were to only include a subset of that genome, the genes could very well be different which will change the cctyper results, since a gene might be missing.

In conclusion, it sounds like you have run cctyper on a metagenome (or a collection of genomes) with default options, where you should have been using --prodigal meta option. In any case, if you want consistency between subsets of contigs and the fuill contig set you should use the --prodigal meta argument.

Jakob

Answer 2 · 2022-03-16T11:44:37.000Z

Thank you Jakob for the quick response.