mikolmogorov/Flye

Anomaly in resource usage

VendelboNM opened this issue · 7 comments

Hi there!

I am conducting whole-genome assembly of some barley cultivars using ~15x coverage ONT R10.4 sequence data and while the first 7 cultivars used 3500-7500 CPU hours and 400-550 Gb RAM at peak, i have two cultivars that have run for more than 10000 CPU hours, and i've had to increase the max ram to 1200 Gb as they ran out of memory at 950 Gb. Nothing in the processed data distinguishes these two assemblies. Any idea what could be going on here?

I am using very basic assembly parameters and have set the minimum overlap rather low as we have a rather small dataset for whole-genome assembly of such a large genome.

flye --nano-hq {input} --read-error 0.03 --min-overlap 1000 --keep-haplotypes -t {threads} -i 1 -g 5.1g -o {outdir}

Update: Runtime of 16300 CPU hours now...stuck at the 02_repeat stage without anything new for more than a week.

Could you please share the flye.log file? Sometimes super repetitive genomes may stuck during graph cosntructions step..

Hi!

Thank you for taking a look at it, i have attached the log file here as requested:

an_flye.log

It is running out of memory. I recommend to use the automatically selected minimum overlap parameter instead of 1000 - this should substantially reduce the complexity of repeat graph.

SOLVED

Thank you for the suggestion - it ran the assembly with no issue, faster than other samples and at a decent memory usage of 500 Gb. Fantastic start of a new week!

Glad it worked!