isovic/racon

VERY slow performance with fastq.gz files

Closed this issue · 3 comments

I recently ran several different assemblies through Racon (ver 1.4.3) with very lengthy execution times. In reviewing these I notice that almost of the time was spent in "loading sequences" (nearly 12 hours in many of my runs). I decided to first decompress the fastq file with gzip (took ~25 minutes) and reran. The "loading sequences" took less than 6 minutes.

An example below, but I have numerous others with comparable issues. In this case the read file was 53GB gzipped and 126GB uncompressed (nodes had 1TB of RAM and nothing else executing):

gzipped fastq

racon -u -t 48 {HiFi.fastq.gz} {minimap2.sam} {genome.fasta} > polished.fasta

[racon::Polisher::initialize] loaded sequences 43007.108622 s

unzipped fastq

racon -u -t 48 {HiFi.fastq} {minimap2.sam} {genome.fasta} > polished.fasta

[racon::Polisher::initialize] loaded sequences 571.343952 s
rvaser commented

Please use a newer version (from https://github.com/lbcb-sci/racon or bioconda). The parsing was fixed from v1.4.4.

Best regards,
Robert

msikic commented

Thanks for the reply and sorry for the confusion on my part.

When I come to the page the only indication it has moved seems to be in the "About" section and isn't a distinct paragraph, so it's very easy to overlook (if there's some other indication, I am missing it). Repo's that I have seen with similar situations (moved but wanted to retain continuity in original location) have put large messages at the top of the readme and have then placed the repo into Archive mode.

In any case, thanks for the work your team has done in maintaining this tool. It is much appreciated!

Screenshot 2023-01-31 at 7 35 14 PM