lbcb-sci/ra

Rala seems stuck

Closed this issue · 23 comments

Hello,
Rala seems somewhat stuck, it has been "running" for 2 days, the threads are in S status. The ram usage is quite high however. Is it normal behaviour? Here was my command

./ra/build/bin/ra -t 48 -x pb allLongReads.noduplicateheader.reformat.fa reads.fq.gz > Ra.assembly.fa
The expected genome size is 120 Mb

And this is the size of the files in the "working directory"

ls -lh
total 323G
-rw-rw-r-- 1 lege lege    0 Aug 19 04:04 layout.fasta
-rw-rw-r-- 1 lege lege 308G Aug 18 21:32 overlaps.paf
-rw-rw-r-- 1 lege lege 810M Aug 19 03:54 preconstruction.fasta
-rw-rw-r-- 1 lege lege  15G Aug 19 04:04 sensitive_overlaps.paf

Thank you

EDIT: a colleague pointed to me there was 250 Go left on the drive, can it be Rala is waiting to reserve enough disk space to write everything at once?

Hi,
could you please paste he log here? How large is you read set and how much RAM do you have?

Best regards,
Robert

The long read file is 36 Go and the illumina is 24 Go. I have 256 Go of ram.
Unfortunately it was killed on my server :/ Is there a way to restart the assembly from

ls
layout.fasta  overlaps.paf  preconstruction.fasta  sensitive_overlaps.paf

minimap2 was over.

You probably run out of disk space as the first iteration overlaps file is around 300GB and in the second iteration a new one is created with similar size. 256GB of RAM should be enough. Unfortunately, there is no option to continue the assembly yet :/

Oh ok,
I am gonna retry on another disk with 2 To of free space, that should do.
thank you!

Hello, I tried again with enough disk space

[M::worker_pipeline::167.525*20.07] mapped 64287 sequences
[M::worker_pipeline::169.413*20.06] mapped 61621 sequences
[M::main] Version: 2.14-r892-dirty
[M::main] CMD: /mnt/sda1/Alessandro/14-08-2019_Ra/ra/vendor/minimap2/minimap2 -t 20 -x map-pb /mnt/sda1/Alessandro/14-08-2019_Ra/ra_work_directory_1566397451.3950646/iter1.fasta allLongReads.noduplicateheader.reformat.fa
[M::main] Real time: 169.474 sec; CPU: 3398.018 sec; Peak RSS: 1.743 GB
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Polisher::initialize] aligned overlap 4204629/4204629
[racon::Polisher::initialize] transformed data into windows
terminate called after throwing an instance of 'std::bad_alloc'262406
  what():  std::bad_alloc

Here was my command
./ra/build/bin/ra -t 20 -x pb allLongReads.noduplicateheader.reformat.fa reads.fq > Ra.assembly.fa

That is weird. Was the work directory deleted? If not, can you please paste here all file sizes in it?

There is no working directory! I did not remove it, so either it was never created or Ra deleted it itself.

there is however a rala assembly graph

total 110G
-rw-rw-r-- 1 lege lege  36G Aug 18 18:04 allLongReads.noduplicateheader.reformat.fa
drwxrwxr-x 7 lege lege 4.0K Aug 21 16:22 ra
-rw-rw-r-- 1 lege lege    0 Aug 21 16:24 Ra.assembly.fa
-rw-rw-r-- 1 lege lege 129M Aug 22 08:48 rala_assembly_graph.gfa
-rw-rw-r-- 1 lege lege  75G Aug 21 16:13 reads.fq

-rwxrwxr-x 1 lege lege 421 Aug 18 17:58 rename.py

The work directory is always created and should be afterwards deleted. You can obtain your contigs with awk '$1 ~/S/ {print ">"$2"\n"$3}' rala_assembly_graph.gfa > rala_layout.fasta and then run racon manually. Here are the commands:

minimap2 -t 20 -x map-pb rala_layout.fasta allLongReads.noduplicateheader.reformat.fa > iter1.paf
racon -t 20 allLongReads.noduplicateheader.reformat.fa iter1.paf rala_layout.fasta > iter1.fasta
minimap2 -t 20 -x map-pb iter1.fasta allLongReads.noduplicateheader.reformat.fa > iter2.paf
racon -t 20 allLongReads.noduplicateheader.reformat.fa iter2.paf iter1.fasta > iter2.fasta
minimap2 -t 20 -x sr iter2.fasta reads.fq.gz > iter3.paf
racon -t 20 reads.fq.gz iter3.paf iter2.fasta > iter3.fasta

If the error in racon persists, let me know (or you can clone the newest racon version and try with it from the start).

thanks
I guess on line fourth it should be iter1.fasta and not rala_iter1.fasta.
minimap2 doesn't have a "sr" preset, what is this option?

Indeed, I edited it, thanks!

 racon -t 20 allLongReads.noduplicateheader.reformat.fa iter1.paf rala_layout.fasta > iter1.fasta
[racon::Polisher::initialize] loaded target sequences 1.150181 s
[racon::Polisher::initialize] error: duplicate sequence ch431_read2674_template_pass_FAH31515 with unequal data

What does this error mean? I am sure that sequence is not duplicated.

It seems that you have duplicate headers. Try searching grep "ch431_read2674_template_pass_FAH31515" allLongReads.noduplicateheader.reformat.fa.

No it's not that

 grep "ch431_read2674_template_pass_FAH31515" allLongReads.noduplicateheader.reformat.fa 
>ch431_read2674_template_pass_FAH31515

Can you do that for rala_layout.fasta as well please?

sure

grep "ch431_read2674_template_pass_FAH31515" rala_layout.fasta      
>ch431_read2674_template_pass_FAH31515

Yeah, I thought so. Please try grep -A1 ">Utg" rala_layout.fasta > layout.fasta and then please run wc -l layout.fasta.

wc -l layout.fasta
744 layout.fasta

Try now the minimap2+racon commands :)

Racon seems to be running all right so far, no error ... I will let you know if I can complete all the steps successfully.
However, here you wrote
minimap2 -t 20 -x sr iter2.fasta reads.fq.gz > iter3.paf

what is -x sr? minimap2 doesn't have that option.

That option is for short reads. You can run it like -x sr or -ax sr if you want alignments as well.

yes sorry, it seems there was a mixup of minimap and minimap2 in my path.
Anyway, everything seems all right now. Thanks

Assembly is over and went well,
thanks for the support.