gbouras13/hybracter

allow specifying phage in dnaapler (or other dnaaper arguments for the choice of start gene)

Closed this issue · 1 comments

HI George,

I followed your advice and tried hybracter on my bacteriophage ONT data ... it rocks!! congrats for that other cool pipeline

two minor suggestions next

I got the same assembly as manually with flye (which was expected).
I did not look closely at the corrections that the illumina reads may have introduced through pypolka (I did not polish my former assembly as I did not yet have short read data).

Both assemblies differ by one base length so probably not too much occurred there.

My assembly is not bacteria + plasmid but bacteriophage alone so the dnaapler did not affect it
Would it be possible to add an option for dnaapler to allow using 'dnaapler phage' instead of the plain dnaapler?
=> Only if this is not breaking something else, I could run dnaapler after hybracter without any problem.

Also, would it be possible to copy the best available gfa file from ./processing/assemblies/ to the FINAL_OUTPUT?
I know it differs from the final assembly in some cases but it is nice to see when some non-assembled contigs are related to other bigger contigs.

Another thing I do after assembling a genome is to map my reads to it to allow visualization in IGV, that could also be an option to add to the pipeline, after loading that BAM+BAI + assembly-fasta + annotations (pharokka in my case) it helps detecting regions of discordance between reads and assembly and potentially assembly errors or SNPs/indels resulting from over-polishing.

thanks for your great tool, it will be used (and cited)!

Hi @splaisan ,

First of all, the Hybracter preprint has just come out if you do want to cite it :) - https://www.biorxiv.org/content/10.1101/2023.12.12.571215v1

This is a really interesting issue. I hadn't actually thought of using Hybracter to assemble phages, but I guess if you ignore the plasmid outputs, and set a -c chromosome value under the genome size of the phage then it should work (with the phage being the 'chromosome')!

With the difference in assemblies, I wouldn't expect much change because phage genomes should be 'easy' to assembl - it is really cool it worked at all.

Regarding the requests:

  1. Dnaapler run in hybracter uses dnaapler all, which includes the large terminase subunit phage genes. Therefore, it would automatically do the reorientation for you - no need to change anything.
  2. I don't want to copy the gfas into the FINAL_OUTPUT, as it will confuse the bacteria users that Hybracter is designed for, because of the targeted plasmid assembly. Sorry about that, but you know where to go and find the file at least for your purposes :)
  3. The polishing idea is a good one as people may find this useful for bacteria too. I will think about it - one thing I am mindful of is at some point a tool like this does have to end (and so those other steps need to be done separately) or else it becomes hard to use and maintain. But I think a final mapping step is probably a good idea.

Thanks for this issue again,

George