adigenova/wengan

Recommendation for Illumina + PacBio HiFi + ultralong ONT?

Closed this issue · 3 comments

Hi there,

Congratulations on a fantastic, useful tool. I have high coverage ultralong ONT (~70x), Illumina WGS (30x), and PacBio HiFi (currently ~10x). When I run in ccsont mode with just the ONT and HiFi reads, I get pretty good results. Is there a way to utilize all three types of data to potentially improve the assemblies further, perhaps by reducing homopolymer indels a bit through use of the Illumina reads? When I try to pass all three types of data to wengan using the --ccsont preset, it seems like the Illumina reads are ignored.

Thanks!

Hi dhoconno,
Thanks! and glad that wengan has been useful for your genome.
Regarding your question, it might be possible to include the short-read data because the current version of the --ccsont pipeline, use a multi-kmer approach with the following k-mer sizes : 41,81,121,161,201,251,301,351 ; thus the short-reads can be used in the high-quality read assembly step up to k=121 for instance (or up to k=201 if your short-reads are 251).
At the moment this is not included in the pipeline and as you said the short-read are being ignored. A quick alternative is to edit the makefile that Wengan generate to control the assembly execution.

Let's said that the wengan command is the following:

perl wengan.pl -x ccsont -a M -s lib1.fwd.fastq.gz,lib1.rev.fastq.gz -l ont.fastq.gz -p asm1 -t 20 -g 3000 -n -b hifi.fastq.gz
# the -n is important because just create the makefile without executing it, is like a preview of the command that wengan will exec.

then, it will generate the makefile asm1.mk, that you can edit as follow:

.DELETE_ON_ERROR:
#Wengan automatic generated makefile
asm1.ccs.ec.fa : 
        zcat hifi.fastq.gz  |  /Users/adigenova/Git/wengan/bin/seqtk seq  -l 60 -A -C -  > asm1.ccs.ec.fa

asm1.minia.41.contigs.fa : asm1.ccs.ec.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.41.txt
        #here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.41.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.41.txt
        /Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.41.txt -kmer-size 41 -abundance-min 2 -out asm1.minia.41 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.41.err > asm1.minia.41.log
        -rm -f asm1.minia.41.unitigs.fa.glue* asm1.minia.41.h5 asm1.minia.41.unitigs.fa

asm1.minia.81.contigs.fa : asm1.minia.41.contigs.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.81.txt
#here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.81.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        @echo asm1.minia.41.contigs.fa  >>  asm1.minia_reads.81.txt
        /Users/adigenova/Git/wengan/bin/minia -in asm1.minia_reads.81.txt -kmer-size 81 -abundance-min 2 -out asm1.minia.81 -minimizer-size 10 -max-memory 5000 -nb-cores 20 2> asm1.minia.81.err > asm1.minia.81.log
        -rm -f asm1.minia.81.unitigs.fa.glue* asm1.minia.81.h5 asm1.minia.81.unitigs.fa

asm1.minia.121.contigs.fa : asm1.minia.81.contigs.fa
        @echo asm1.ccs.ec.fa  >  asm1.minia_reads.121.txt
#here we add the short-read data
        @echo lib1.fwd.fastq.gz  >>  asm1.minia_reads.121.txt
        @echo lib1.rev.fastq.gz  >>  asm1.minia_reads.121.txt
......

Then you can use the edited makefile to run the whole pipeline with:

make -f asm1.mk all

I have not tested this read-set combination, but the above method is one good alternative to include the short-read data into your assembly.

Best,
Alex

Thanks! I'll give that a whirl!

@dhoconno Please let us know if adding the Illumina reads improved your assembly. Thanks.