isovic/racon

error: overlap is not transmuted!

Opened this issue ยท 87 comments

Hi,
I am trying to polish a PacBio assembly with illumina reads. After one round of polishing using the pacbio reads, I mapped the illumina reads to the the polished assembly with minimap2 and used the sam output as overlap information for polishing using the following commands:

minimap2 -t 64 -ax sr consensus1.fasta ${reads} > illumina.sam
racon -t 64 ${reads} illumina.sam consensus1.fasta > consensus2.fasta

The mapping works quite well, but after a few hours running I hit the following error:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I am not sure what does it mean or how to fix this. Any suggestions are more than welcome!
Kind regards,

Hello,
which version of racon are you using?

Best regards,
Robert

It is really strange that this particular error occurs. Would you mind sharing your data so I can investigate further?

Best regards,
Robert

How large is the created sam file and how much RAM does your machine have?

Could you please add the following lines of code at https://github.com/isovic/racon/blob/master/src/polisher.cpp#L330:

for (uint64_t i = 0; i < overlaps.size(); ++i) {
    if (overlaps[i] == nullptr) {
         fprintf(stderr, "Null at %lu/%zu\n", i, overlaps.size());
         continue;
    }
    if (!overlaps[i]->is_transmuted()) {
        fprintf(stderr, "Not transmuted at %lu/%zu\n", i, overlaps.size());
    }
}

and the following lines at https://github.com/isovic/racon/blob/master/src/overlap.hpp#L63:

    bool is_transmuted() const {
        return is_transmuted_;
    }

Run make and try running the same racon command as above. Please paste the output here.

Sure, send me the error you are getting when building from source.

Hi Robert,
I recompiled the source code with your suggested changes and the program is running, but the STDERR is very large. Look at the first 25 lines of the log:

racon -t 64 ${DIR}/merged.shuffled.fastq ${DIR}/shrimp.illumina.sam ${DIR}/shrimp_consensus.fasta > ${DIR}/shrimp_consensus2.fasta
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/721313620
Not transmuted at 1/721313620
Not transmuted at 2/721313620
Not transmuted at 3/721313620
Not transmuted at 4/721313620
Not transmuted at 5/721313620
Not transmuted at 6/721313620
Not transmuted at 7/721313620
Not transmuted at 8/721313620
Not transmuted at 9/721313620
Not transmuted at 10/721313620
Not transmuted at 11/721313620
Not transmuted at 12/721313620
Not transmuted at 13/721313620
Not transmuted at 14/721313620
Not transmuted at 15/721313620
Not transmuted at 16/721313620

and it goes on and on until the end:

Not transmuted at 721313615/721313620
Not transmuted at 721313616/721313620
Not transmuted at 721313617/721313620
Not transmuted at 721313618/721313620
Not transmuted at 721313619/721313620

I am not entirely sure how to interpret this. What does "Not transmuted" mean for racon? Does this mean it cannot calculate the consensus for some reason? Were the reads incorrectly mapped or are there way too many indels/mutations to estimate a consensus?

The program worked quite well during the first polishing stage using long reads. The only change has been the use of short reads to the consensus_1. Any ideas are more than welcome.

Kind regards,

The error means that the read identifiers weren't replaced with read IDs in overlaps, see here https://github.com/isovic/racon/blob/master/src/overlap.cpp#L129-L177. I don't know how this output is even possible, if a overlap isn't transmuted it is deleted and your output indicates that not a single one is transmuted (and none of them are deleted). You are running racon on a single node right?

Can you please paste here a couple of non-header lines of your SAM file, and a few sequence headers from Illumina reads and polished contigs?

Hi Robert,

thanks for your quick reply. I am using 64 threads, see command below:

racon -t 64 ${reads} illumina.sam consensus.fasta > consensus2.fasta

please see below the tail of my sam file:

AGRF-33:351:HMHKNBCXX:2:2215:21275:100712       161     contig_72160    13809   1       22M1I12M1I12M1I16M1I13M4D27M5D35M6D34M3D27M12S  contig_20689    2573    0       TATATATATAATATATATATATAATATATATATATAATATATATATATAATATATATATATATATAATATATATATATATATAATATATATATATACATATATATAATATATATATATATATATATATATATATATATATAAATATATTTATATAATATATATATATATATATATTAAATATATAATATATATATATATATAATATATATATAA      DDDCDIIIIIIIIIIIIIIIHIIHHIEHHE1DCGHHHHIIIIIIIIIIHIIIIIIHHIIHHIIIIFEGHHHIEHEHI?FHHH@FHHH?FIIGIHGF1DFHIFHIC1G1HHHI?HHIIIIIIEHEEHHHFFF1<<F1<1<CD1<<1<D<1<DC11111<1DFG1CFFEHF@1<1111111<CC1<<1<<<<C@CG0<<<@C0<00<0<0<09<00      NM:i:27 ms:i:206        AS:i:206    nn:i:0  tp:A:P  cm:i:1  s1:i:43 s2:i:43
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730       89      contig_15432    3165    1       23S30M2I12M10D42M3I31M57S       =       3165    -125    TNAAGAGATAAATAGAGAGAGAGAGAGAGAGAGAGAGATAGAGAGAGAGATAGATAGAGAGATAGATAGAGAGATAGAGAGATAGAAAGAGAGAGAGAGAGAGAGAGAGTTAGAGAGAGAGAGAGAAAGAGAGAGAGAGAGAGTAAGAAAGAGAAAGAAAGAGATAGAGATAGAGAAAAATACAGAGAGAGAGAGAAANA    1#1111<<1<<<1<1EEFC<1E@C1<1F<<1<1<1DC<1FD11IHHHG@C1HHH?F1<1<1<1CCD1FCD1GCD1F<D1F1<1<1<1HGFHFHHHGGCCHHFCD1@1D<<1FHHHEEFEIHF@GFHEHHIIIHE@HHHEHEHEHHHEIHHIHIHHIIIHIHF<HEHIIHHEIIIHHFHHF@HHHCIIIIHEGHFHD<<#D    NM:i:19 ms:i:124        AS:i:124        nn:i:0  tp:A:P  cm:i:2  s1:i:37 s2:i:43
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730       165     contig_15432    3165    0       *       =       3165    125     AGAGATNGANAAAGNGAGAAAGAGAGAGNNAGAGAGAGNGAGAGAAAGAGNGAGAGNGANAGNTNGNGANAGAGNGAGATNNNNAGATAANTAGAGAGANAGAGANAGAGNGAGAGAGAGATAGAGAGNGAGANAGAGAGNGAGNNNGA       <D00<<#<1#<<<D#<<<<CHH1DCD<G##1<<D?CCG#<<DDE11DGEH#<<D<E#<<#1<#<#<#<1#<<1D#1<<D1####<<11<<#1<<<D<E<#1<<DD#<<1<#<<11DE01EC1<FHFC@#0<<<#<000<E#//<###</
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736       73      contig_72999    2205    1       23M1I4M23I39M104S       =       2205    0       GAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGCGGAGAGGGGGAGAGACAATGAGAGAGAGAGAATGAGAGAGAGAGAGAGAGAGAGAGAGCGTAGGAGAGAGAGTGCGCAAGAAAGAGAGAGAAAAAGACAGCGAGAGAAAGAAGGAGAGGGAGAAAGAGAGGGAGAGGGCGAGGAGGGCGCGGCATGCTGATA  DBD@@<0<ECEHHIH?G?=<0<<E=<C<0000<////<<<<//<0<<<111<11<C1<1D10<111<11<<C10<01D1<<D@H@@10<01/<0/011101<1<<11/</</1111111010=111<0<1=C=/<<//01100010000000<0000000=0/000///:/://///.::--//---/;://8/  NM:i:24 ms:i:78 AS:i:78 nn:i:0  tp:A:P  cm:i:3  s1:i:29 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736       133     contig_72999    2205    0       *       =       2205    0       CTCTCTCTTTCTCTCTCTCTCTCTCTTTTTATCTCTATCTCTNTCTTTCTCCCTCTTTGTCTTCTTCTTCGTCTNTTTCTTTCTCTTTTTTTCTCTCTTTCTCTCTTCTTCTCGTTTTCTGTCTTTCTCTTATTCTTATTCATCTCATTACTTTCATCATTCTTATTATCATAATCCTCCCCTT    <<@0<11<111<<<D1<1<<11111<11111<111<1<<11<#<<11<<<1<1111111<<C111<<<1<110<#1111<111<<<1<1</<1<11<11<D1<1<<1<1<1<10<<1<1C1<11<1<1<<11<<111<1<1111<11<<1<1<1<1<111<1<<11<C<<<<<<<<1<<11<00

I do not see anything unusual with the sam file. It was produced using minimap2 like this:

minimap2 -t 64 -ax sr consensus.fasta ${reads} > illumina.sam

Are there maybe sequences with identical names (paired ends might have equal names up to the first white space)?

There should be an error regarding mismatched read lengths if there are two or more sequences with the same name (meaning each sequence name in a file should be unique up to the first white space). Maybe the error you are getting is due to multithreading and the exit function. Try fixing the Illumina read names and lets see if it resolves.

Hi Robert,
I just remembered that I changed the names of the illumina reads and added a /1 and /2 before shuffling the reads. But the /1 and /2 were removed from the read names after mapping. I'll fix the same file manually and try again. I'll keep you updated.

Cheers,

So just a few updates:

  1. Minimap2.1 removes the trailing /1 and /2 from read names, but still requires them to map them as paired end
  2. Minimap2.1 still reports secondary alignments even if -N 0 is set.
  3. Due to these problems I have had to modify the sam file manually to:
    a) remove secondary alignments
    b) add a _1 or _2 to the read name to give them different names.
    Below is a short perl script to do this on the sam file:
use strict;
use warnings;
use Data::Dumper;

my $in = shift or die "Please give the sam file as argument\n";

open (SAM, "<", $in);

while (my $r = <SAM>){
  if ($r =~ /^@/){
    print $r;
    next;
  };
  chomp $r;
  my @r = split(/\t/, $r);
  my $bin = dec2bin($r[1]);
  if (defined($bin->[-12])){print STDERR Dumper(\@r); next};
  my $pair = "_1";
  if (defined($bin->[-8])){$pair = "_2"};
  $r =~ s/\t/$pair\t/;
  print $r . "\n";
};
close SAM;

sub dec2bin {
    my ($flags) = @_;
    my $str = unpack("B32", pack("N", $flags));
    $str =~ s/^0+(?=\d)//;   # otherwise you'll get leading zeros
    my $bin = [split (//, $str)];
    return $bin;
}

Hope this helps anyone. I am currently modifying the sam file and then will run racon again to see if the previous error was fixed.

I'll see if I can add a easier way to process paired ends.

Hi Robert,
After fixing the sam file it looks like this:

AGRF-33:351:HMHKNBCXX:2:2215:21275:100712_1     81      contig_27617    23272   1       9M1D61M1D14M1D8M1D14M1D13M1D11M1D10M13D12M1I15M1D22M1D25M1D13M1D9M1D13M        contig_100327   16626   0       AATATATATATATATATATAATATATATATATAAAATATATATATATAAAACATATATAAAATATATATAATATATATATAAATTATATATAATATATATATATATTATATATATATATTATATATATATTATATACATAAAATATATATATAATATATATATATATAATATATATAAATATATATATATATATATATATATATATATATATATAATATATATATATAATATATATAATATATATATATA    <00G?HFHCC<0HF?HGHGD0@IHIHGHFGGD<01FEHED1D1HGG<<111<C<F<<1D<1<1HCHFC<C<1HGIHIIHF<<11<1HHGGC<11HIHIIHHGD<<1C?HHH@HIHHED<1@GHF@HIHGC1CGHF<1D1D11IIIHHGDHHFD1?IIIIIHIHHIHGD1@IIIIIHD1G@IIIIIHIIIHIIIHIHIIIIIIIIHIIHIIIIHHIHIIIIHIIIIIHIIIIIIHHIIIHIIHHHHDDDBD     NM:i:33ms:i:216        AS:i:217        nn:i:0  tp:A:P  cm:i:3  s1:i:51 s2:i:45
AGRF-33:351:HMHKNBCXX:2:2215:21275:100712_2     161     contig_100327   16626   2       79S27M108S      contig_27617    23272   0       TATATATATAATATATATATATAATATATATATATAATATATATATATAATATATATATATATATAATATATATATATATATAATATATATATATACATATATATAATATATATATATATATATATATATATATATATATAAATATATTTATATAATATATATATATATATATATTAAATATATAATATATATATATATATAATATATATATAA        DDDCDIIIIIIIIIIIIIIIHIIHHIEHHE1DCGHHHHIIIIIIIIIIHIIIIIIHHIIHHIIIIFEGHHHIEHEHI?FHHH@FHHH?FIIGIHGF1DFHIFHIC1G1HHHI?HHIIIIIIEHEEHHHFFF1<<F1<1<CD1<<1<D<1<DC11111<1DFG1CFFEHF@1<1111111<CC1<<1<<<<C@CG0<<<@C0<00<0<0<09<00NM:i:0  ms:i:54 AS:i:54 nn:i:0  tp:A:P  cm:i:1  s1:i:42 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730_1     89      contig_22953    9135    10      33S47M120S      =       9135    -47     TNAAGAGATAAATAGAGAGAGAGAGAGAGAGAGAGAGATAGAGAGAGAGATAGATAGAGAGATAGATAGAGAGATAGAGAGATAGAAAGAGAGAGAGAGAGAGAGAGAGTTAGAGAGAGAGAGAGAAAGAGAGAGAGAGAGAGTAAGAAAGAGAAAGAAAGAGATAGAGATAGAGAAAAATACAGAGAGAGAGAGAAANA      1#1111<<1<<<1<1EEFC<1E@C1<1F<<1<1<1DC<1FD11IHHHG@C1HHH?F1<1<1<1CCD1FCD1GCD1F<D1F1<1<1<1HGFHFHHHGGCCHHFCD1@1D<<1FHHHEEFEIHF@GFHEHHIIIHE@HHHEHEHEHHHEIHHIHIHHIIIHIHF<HEHIIHHEIIIHHFHHF@HHHCIIIIHEGHFHD<<#D       NM:i:0  ms:i:94 AS:i:94 nn:i:0tp:A:P   cm:i:5  s1:i:43 s2:i:42 SA:Z:contig_21862,13169,+,58S106M1I35S,10,16;
AGRF-33:351:HMHKNBCXX:2:2215:21345:100730_2     165     contig_22953    9135    0       *       =       9135    47      AGAGATNGANAAAGNGAGAAAGAGAGAGNNAGAGAGAGNGAGAGAAAGAGNGAGAGNGANAGNTNGNGANAGAGNGAGATNNNNAGATAANTAGAGAGANAGAGANAGAGNGAGAGAGAGATAGAGAGNGAGANAGAGAGNGAGNNNGA  <D00<<#<1#<<<D#<<<<CHH1DCD<G##1<<D?CCG#<<DDE11DGEH#<<D<E#<<#1<#<#<#<1#<<1D#1<<D1####<<11<<#1<<<D<E<#1<<DD#<<1<#<<11DE01EC1<FHFC@#0<<<#<000<E#//<###</
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736_1     73      contig_72999    2205    1       23M1I4M23I39M104S       =       2205    0       GAGAGAGAGAGAGAGAGAGAGAGAGAGAGGGCGGAGAGGGGGAGAGACAATGAGAGAGAGAGAATGAGAGAGAGAGAGAGAGAGAGAGAGCGTAGGAGAGAGAGTGCGCAAGAAAGAGAGAGAAAAAGACAGCGAGAGAAAGAAGGAGAGGGAGAAAGAGAGGGAGAGGGCGAGGAGGGCGCGGCATGCTGATA    DBD@@<0<ECEHHIH?G?=<0<<E=<C<0000<////<<<<//<0<<<111<11<C1<1D10<111<11<<C10<01D1<<D@H@@10<01/<0/011101<1<<11/</</1111111010=111<0<1=C=/<<//01100010000000<0000000=0/000///:/://///.::--//---/;://8/     NM:i:24 ms:i:78 AS:i:78 nn:i:0  tp:A:Pcm:i:3   s1:i:29 s2:i:0
AGRF-33:351:HMHKNBCXX:2:2215:21255:100736_2     133     contig_72999    2205    0       *       =       2205    0       CTCTCTCTTTCTCTCTCTCTCTCTCTTTTTATCTCTATCTCTNTCTTTCTCCCTCTTTGTCTTCTTCTTCGTCTNTTTCTTTCTCTTTTTTTCTCTCTTTCTCTCTTCTTCTCGTTTTCTGTCTTTCTCTTATTCTTATTCATCTCATTACTTTCATCATTCTTATTATCATAATCCTCCCCTT      <<@0<11<111<<<D1<1<<11111<11111<111<1<<11<#<<11<<<1<1111111<<C111<<<1<110<#1111<111<<<1<1</<1<11<11<D1<1<<1<1<1<10<<1<1C1<11<1<1<<11<<111<1<1111<11<<1<1<1<1<111<1<<11<C<<<<<<<<1<<11<00

As you can see the read names for pairs are different (_1 and _2) and only the main alignment is kept (no secondary alignments). Unfortunately, the error persists:

racon -t 64 merged.shuffled.fastq illumina.corrected.sam consensus.fasta > consensus2.fasta
[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/627471727
Not transmuted at 1/627471727
(...)
Not transmuted at 627471722/627471727
Not transmuted at 627471723/627471727
Not transmuted at 627471724/627471727
Not transmuted at 627471725/627471727
Not transmuted at 627471726/627471727
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

and no consensus is written. Should I retry using only one thread instead of 64 as I am currently doing? Will this make the execution much slower?
Any suggestion is more than welcome.
Regards,
Juan Montenegro

Hi Juan,
do the reads in the FASTQ file have the updated names as in the SAM file?

Best regards,
Robert

Hi Robert,
Silly me, I had not modified the fastq file. I did it and I finally got the consensus. I am now mapping some illumina reads again, I should see fewer indels (if any) in the alignments, is that right? Should I give another round of polishing or do you reckon 1 pacbio and 1 illumina polishing should be enough?
Kind regards

I would advise at least 2 times PacBio and at least 1 time Illumina polishing.

Best regards,
Robert

Hello, I have the same issue with ONT long reads...
I overlap with minimap2:
minimap2 -x map-ont -t 44 genome.fa reads.fq > genVont.paf
Then I try Racon 1.3.1 (downloaded with conda):
racon -t 44 -w 750 genome.fa genVont.paf reads.fq

I looked at reads name in the .fq file and I find no empty spaces, same for the genome.fa file.
Then the output looks like:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

Hi Antoine,
that error is quite intriguing. Are you by any chance running racon on a server with Slurm or something similar? Did you check that all sequences have unique identifiers? You can paste here a couple of headers.

Best regards,
Robert

Hello, I run locally on a desktop machine without slurm.

I could not find duplicated identifiers in reads nor in contigs, here are example IDs:
Ctgs:

>scf7180000001958
>scf7180000001959
>scf7180000001960
>scf7180000001961
>scf7180000001962
>scf7180000001963
>scf7180000001964
>scf7180000001965
>scf7180000001966
>scf7180000001967

Reads:

@ch111_read10_template_pass_FAH43284
@ch108_read10_template_pass_FAH43284
@ch116_read10_template_pass_FAH43284
@ch110_read10_template_pass_FAH43284
@ch103_read10_template_pass_FAH43284
@ch123_read10_template_pass_FAH43284
@ch102_read10_template_pass_FAH43284
@ch1_read10_template_pass_FAH43284
@ch130_read10_template_pass_FAH43284
@ch135_read10_template_fail_FAH43284

Do you mind sending me your data so I can investigate locally (contigs and reads)? I have no idea how this error could occur, probably some undefined behaviour.

I will have to ask permission, this may take time. If I get it, I will comment on this issue, for the time being, I will try to reformat reads and contigs. Thanks for your help I will comment if I make it work.

If you are willing you can help me debug it without sending me the data (it might even be faster). If so, please follow the following instructions: #77 (comment).

Ok, I got this result now, it seems all elements of the list return "not transmuted"

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/241317
Not transmuted at 1/241317
Not transmuted at 2/241317
Not transmuted at 3/241317
Not transmuted at 4/241317
Not transmuted at 5/241317
Not transmuted at 6/241317
--- a lot of not transmuted lines ---
Not transmuted at 241312/241317
Not transmuted at 241313/241317
Not transmuted at 241314/241317
Not transmuted at 241315/241317
Not transmuted at 241316/241317
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I don't understand how that is possible. I'll check the code again and report here soon. Thanks for testing!

By the way, I tried reformating my reads file:
Reads headers now look like:

>ch1537_read2552_template_pass_PAC16434

Contigs headers look like:

>scf7180000002512

Command to obtain the sam file with minimap2:
minimap2 -t 44 -c -a -x map-ont genome.fasta ont_25x.fa > scfVont.sam
Command to start racon:
nohup ./racon/build/bin/racon -t 44 -w 750 genome.fasta scfVont.sam ont_25x.fa &

Are there more of [racon::Overlap::find_breaking_points] error: overlap is not transmuted! lines or just 4-5?

Just these 5 lines at the end.

Not transmuted at 241316/241317
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

You are using the latest commit right? Nevermind, I see that you are using conda v1.3, I'll check there.

I just cloned the repository to access files to modify and build on my own. I get the same kind of results with conda. racon --version outputs v1.3.1

I use minimap2 v2.14-r883
When I used conda racon version also outputs v1.3.1

How many lines are in the overlap file (paf/sam)? Does it match with 241317 as reported in the log?

Also please add

fprintf(stderr, "Missing query name %s\n, q_name_.c_str());

before https://github.com/isovic/racon/blob/master/src/overlap.cpp#L144, and

fprintf(stderr, "Missing target name %s\n, t_name_.c_str());

before https://github.com/isovic/racon/blob/master/src/overlap.cpp#L162. Hit make and rerun.

I have 294072 lines in the sam file. (241317 in not transmuted output)
The output looks exactly the same, I find no Missing target or query line.

What is the size of the paf file?

Next please try adding

fprintf(stderr, "Processing overlaps from %u to %zu\n", l, overlaps.size());

at https://github.com/isovic/racon/blob/master/src/polisher.cpp#L278, and

fprintf(stderr, "Overlap %u/%zu is not valid, deleting\n", i, overlaps.size());

before https://github.com/isovic/racon/blob/master/src/polisher.cpp#L284, and

    if (!overlaps[i]->is_valid()) {
        fprintf(stderr, "Not valid at %lu/%zu\n", i, overlaps.size());
    }

in the same for loop you added at line 330 before.

The .paf file contains 291748 lines. I will add it then rerun

I get:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
Processing overlaps from 1 to 269501
Overlap 0/269501 is not valid, deleting
Overlap 1/269501 is not valid, deleting
Overlap 2/269501 is not valid, deleting
Overlap 3/269501 is not valid, deleting
--
Overlap 269497/269501 is not valid, deleting
Overlap 269498/269501 is not valid, deleting
Overlap 269499/269501 is not valid, deleting
Overlap 269500/269501 is not valid, deleting
Processing overlaps from 1 to 22247
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/22247
Not transmuted at 1/22247
Not transmuted at 2/22247
Not transmuted at 3/22247
Not transmuted at 4/22247
--
Not transmuted at 22243/22247
Not transmuted at 22244/22247
Not transmuted at 22245/22247
Not transmuted at 22246/22247
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I notice that number of overlaps is different when I use .paf and .sam output. Is it normal?

I guess it should be equal. Or maybe some overlaps have bad alignments and are not printed, not sure though.

Btw, the first print function in #77 (comment) has lowecase L and not 1. Could you please rerun it again? :)

Now I get this:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
Processing overlaps from 0 to 269501
Overlap 0/269501 is not valid, deleting
Overlap 1/269501 is not valid, deleting
Overlap 2/269501 is not valid, deleting
Overlap 3/269501 is not valid, deleting
--
Overlap 269497/269501 is not valid, deleting
Overlap 269498/269501 is not valid, deleting
Overlap 269499/269501 is not valid, deleting
Overlap 269500/269501 is not valid, deleting
Processing overlaps from 4294697795 to 22247
[racon::Polisher::initialize] loaded overlaps
Not transmuted at 0/22247
Not transmuted at 1/22247
Not transmuted at 2/22247
Not transmuted at 3/22247
Not transmuted at 4/22247
--
Not transmuted at 22241/22247
Not transmuted at 22242/22247
Not transmuted at 22243/22247
Not transmuted at 22244/22247
Not transmuted at 22245/22247
Not transmuted at 22246/22247
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

A quick fix is to replace line https://github.com/isovic/racon/blob/master/src/polisher.cpp#L314 with

l = c < n ? 0 : c - n;

This will lead us to the real error which is ... empty overlap set! Can you please verify that?

I got:
[racon::Polisher::initialize] error: empty overlap set!
Seems that is it!

Great! Can you please copy here a few lines from the .paf file?

Here are the first 20 lines: ont.head20.paf.txt

If you need more let me know :)

That looks fine. Please add

fprintf(stderr, "Missing query name %s\n, q_name_.c_str());

at https://github.com/isovic/racon/blob/master/src/overlap.cpp#L139 and

fprintf(stderr, "Missing target name %s\n, t_name_.c_str());

at https://github.com/isovic/racon/blob/master/src/overlap.cpp#L157 (I pasted the wrong lines before).

P.S. Racon won't use the CIGAR string stored in PAF format, only in SAM.

Ok, I have all of them missing it seems:

Overlap 22233/22247 is not valid, deleting
Missing query name ch2182_read139453_template_fail_PAC16434
Overlap 22234/22247 is not valid, deleting
Missing query name ch2182_read139453_template_fail_PAC16434
Overlap 22235/22247 is not valid, deleting
Missing query name ch1908_read107080_template_fail_PAC16434
Overlap 22236/22247 is not valid, deleting
Missing query name ch1908_read107136_template_fail_PAC16434
Overlap 22237/22247 is not valid, deleting
Missing query name ch776_read260672_template_fail_PAC16434
Overlap 22238/22247 is not valid, deleting
Missing query name ch1775_read58163_template_fail_PAC16434
Overlap 22239/22247 is not valid, deleting
Missing query name ch1775_read58163_template_fail_PAC16434
Overlap 22240/22247 is not valid, deleting
Missing query name ch2182_read139817_template_fail_PAC16434
Overlap 22241/22247 is not valid, deleting
Missing query name ch2182_read139817_template_fail_PAC16434
Overlap 22242/22247 is not valid, deleting
Missing query name ch1666_read101654_template_fail_PAC16434
Overlap 22243/22247 is not valid, deleting
Missing query name ch1768_read85887_template_fail_PAC16434
Overlap 22244/22247 is not valid, deleting
Missing query name ch1908_read107326_template_fail_PAC16434
Overlap 22245/22247 is not valid, deleting
Missing query name ch1659_read114573_template_fail_PAC16434
Overlap 22246/22247 is not valid, deleting
[racon::Polisher::initialize] error: empty overlap set!

When I grep reads uniq missing i get 97559 queries missing. On a total of 99426 reads (the difference left is, I guess, simply not mapping anywhere)

I did not fully understand the last few sentences. Using grep you did not find the printed read names in the overlap file or?

I just grepped names of reads in the stderr of racon and counted if all of them are missing. So 97559 queries are missing on 99426 reads in my .fa file

Oh, now I get it. Try and grep the names in the paf file as well (just a few of them).

The paf file contains 97559 different reads same number as missing queries.

Do the names match between the paf file and the fasta file? Because the error racon encountered means they do not.

Okay! You passed the read/genome files the other way around, I double checked it. The correct way to call racon is racon <reads> <overlaps> <contigs> :)

Seriously :O ! Sorry for the inconvenience it's just plain stupid of me!! Well thank you I don't understand why I did not think of this before!

No problem :) Thanks a lot for helping me out with the pesky not transmuted error!

SDA16 commented

Hi all,
I got the same error (overlap are not transmuted!) in racon command... I'm working with Nanopore fatsq files. I run the Deepbinner for ONT barcode demultiplexing, then I run an assembly with wtdbg2 and as I wanted to polish my genome, I obtained the .sam, using bwa in conda environment, to put in racon command.
I run: racon deepbinner.fastq.gz aligned_WTDBG2.sam deepbinner.fastq.gz > output1.fasta, and it worked.
Then I wanted to polish the output, so I run racon output1.fasta aligned_WTDBG2.sam > output2.fasta but I got the following error:

[racon::Polisher::initialize] loaded target sequences
[racon::Polisher::initialize] loaded sequences
[racon::Polisher::initialize] loaded overlaps
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I'm completely new in using racon program and I don't understand how can I fix this error.
Thank you in advance for help

Sara

Hi Sara,
could you please paste all commands you were using? The error is probably wrong ordering of racon input arguments.

Best regards,
Robert

SDA16 commented

Dear rvaser,
Thank you for your reply, here I attach the commands I used, starting from the assembly with wtdbg2:

wtdbg2 -x ont -genomesize  -i BC1.fastq.gz -t 6 -fo wg.sample1
wtpoa-cns -t 6 -i wg.sample1.ctg.lay.gz -fo sample1.ctg.fa
bwa index sample1.ctg.fa
bwa mem sample1.ctg.fa BC1.fastq.gz > aln-sample1.sam
racon -f BC1.fastq.gz aln-sample1.sam sample1.ctg.fa > racon1.sample1.fasta
racon -f racon1.sample1.fasta aln-sample1.sam sample1.ctg.fa > racon2.sample1.fasta

Best regards
Sara

Your second racon command is wrong (also missing a mapping step). Try the following:

bwa index sample1.ctg.fa
bwa mem sample1.ctg.fa BC1.fastq.gz > aln-sample1.sam
racon -f -t 6 BC1.fastq.gz aln-sample1.sam sample1.ctg.fa > racon1.sample1.fasta

bwa index racon1.sample1.fasta
bwa mem racon1.sample1.fasta BC1.fastq.gz > aln-sample2.sam
racon -f -t 6 BC1.fastq.gz aln-sample2.sam racon1.sample1.fasta > racon2.sample1.fasta

Best regards,
Robert

P.S. Racon uses 1 thread by default (enable more with -t as above). Option -f (which you are using) will use all possible overlaps bwa found and racon will be slower than the default version, which only takes the best overlap per read. That is up to you though.

SDA16 commented

Thank you very much for your clear explanation, I will let you know if it will work!
Kind regards,
Sara

SDA16 commented

Dear Robert,
Thank you very much for your help! Now all is working :)
Best regards,
Sara

Hi,
I also got the error(error: overlap is not transmuted!) after running racon on nanopore reads(ONT long reads).
I used SLURMscripts for everything .
For overlapping I used Minimap2(I tried it the first time with a .paf file, the second time with a .sam file)
racon command:
echo "Input reads: " $1
reads=$1
echo "Input overlaps: " $2
ovlps=$2
echo "Input conitgs: " $3
cntgs=$3
racon -t 48 $reads $ovlps $cntgs > $reads.sam.racon.fastq

racon output:
Input reads: iddm.fastq
Input overlaps: iddm.fastq.paf
Input conitgs: iddm.ctg.fa
[racon::Polisher::initialize] loaded target sequences 5.063910 s
[racon::Polisher::initialize] loaded sequences 76.101899 s
[racon::Polisher::initialize] loaded overlaps 155.985664 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

I tried the same script with the .sam file

Input reads: iddm.fastq
Input overlaps: iddm.fastq.dual.sam
Input conitgs: iddm.ctg.fa
[racon::Polisher::initialize] loaded target sequences 4.936717 s
[racon::Polisher::initialize] loaded sequences 64.061906 s
[racon::Polisher::initialize] loaded overlaps 2869.280305 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

Here are my Minimap scripts:
for the .paf file
echo "Input file: " $1
fastq=$1
minimap2 -x ava-ont -t 48 -a $fastq $fastq > $fastq.paf

And the result:

Input file: ../../iddm.fastq
[M::mm_idx_gen::101.9451.81] collected minimizers
[M::mm_idx_gen::110.293
3.57] sorted minimizers
[M::main::110.2943.57] loaded/built the index for 414600 target sequence(s)
[M::mm_mapopt_update::113.692
3.49] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 414600
[M::mm_idx_stat::115.6923.45] distinct minimizers: 201503003 (31.48% are singletons); average occurrences: 6.778; average spacing: 2.930
[M::worker_pipeline::158.780
10.41] mapped 39197 sequences
[M::worker_pipeline::180.28414.62] mapped 39391 sequences
[M::worker_pipeline::209.284
17.78] mapped 41758 sequences
[M::worker_pipeline::232.55620.24] mapped 41173 sequences
[M::worker_pipeline::258.283
22.49] mapped 38831 sequences
[M::worker_pipeline::284.09123.95] mapped 31375 sequences
[M::worker_pipeline::307.519
25.45] mapped 34915 sequences
[M::worker_pipeline::325.40026.67] mapped 147452 sequences
[M::worker_pipeline::341.811
27.70] mapped 163866 sequences
[M::worker_pipeline::359.16028.65] mapped 160113 sequences
[M::worker_pipeline::374.873
29.47] mapped 160627 sequences
[M::worker_pipeline::390.77630.24] mapped 159259 sequences
[M::worker_pipeline::406.774
30.95] mapped 158131 sequences
[M::worker_pipeline::422.39131.58] mapped 157935 sequences
[M::worker_pipeline::438.441
32.19] mapped 158613 sequences
[M::worker_pipeline::454.54832.74] mapped 158987 sequences
[M::worker_pipeline::470.156
33.24] mapped 158144 sequences
[M::worker_pipeline::485.72233.72] mapped 161848 sequences
[M::worker_pipeline::500.952
34.15] mapped 166190 sequences
[M::worker_pipeline::514.70934.31] mapped 159601 sequences
[M::mm_idx_gen::626.977
28.49] collected minimizers
[M::mm_idx_gen::630.15428.56] sorted minimizers
[M::main::630.154
28.56] loaded/built the index for 1277557 target sequence(s)
[M::mm_mapopt_update::630.15428.56] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1277557
[M::mm_idx_stat::632.247
28.47] distinct minimizers: 206534157 (31.14% are singletons); average occurrences: 6.585; average spacing: 2.941
[M::worker_pipeline::684.17928.27] mapped 39197 sequences
[M::worker_pipeline::707.748
28.83] mapped 39391 sequences
[M::worker_pipeline::736.84029.25] mapped 41758 sequences
[M::worker_pipeline::764.889
29.64] mapped 41173 sequences
[M::worker_pipeline::791.23930.07] mapped 38831 sequences
[M::worker_pipeline::819.565
30.40] mapped 31375 sequences
[M::worker_pipeline::842.40830.79] mapped 34915 sequences
[M::worker_pipeline::865.941
31.25] mapped 147452 sequences
[M::worker_pipeline::886.65431.64] mapped 163866 sequences
[M::worker_pipeline::908.106
32.02] mapped 160113 sequences
[M::worker_pipeline::928.75432.38] mapped 160627 sequences
[M::worker_pipeline::949.875
32.73] mapped 159259 sequences
[M::worker_pipeline::969.54633.05] mapped 158131 sequences
[M::worker_pipeline::989.556
33.35] mapped 157935 sequences
[M::worker_pipeline::1011.77333.68] mapped 158613 sequences
[M::worker_pipeline::1032.118
33.95] mapped 158987 sequences
[M::worker_pipeline::1052.98534.23] mapped 158144 sequences
[M::worker_pipeline::1072.663
34.48] mapped 161848 sequences
[M::worker_pipeline::1092.60934.73] mapped 166190 sequences
[M::worker_pipeline::1109.622
34.83] mapped 159601 sequences
[M::mm_idx_gen::1167.25933.20] collected minimizers
[M::mm_idx_gen::1168.846
33.21] sorted minimizers
[M::main::1168.84633.21] loaded/built the index for 645249 target sequence(s)
[M::mm_mapopt_update::1168.846
33.21] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 645249
[M::mm_idx_stat::1170.57033.16] distinct minimizers: 162491418 (39.95% are singletons); average occurrences: 4.072; average spacing: 2.942
[M::worker_pipeline::1201.724
33.12] mapped 39197 sequences
[M::worker_pipeline::1219.02933.30] mapped 39391 sequences
[M::worker_pipeline::1242.382
33.32] mapped 41758 sequences
[M::worker_pipeline::1267.73633.30] mapped 41173 sequences
[M::worker_pipeline::1284.954
33.42] mapped 38831 sequences
[M::worker_pipeline::1307.85833.48] mapped 31375 sequences
[M::worker_pipeline::1325.489
33.61] mapped 34915 sequences
[M::worker_pipeline::1340.83233.76] mapped 147452 sequences
[M::worker_pipeline::1354.540
33.90] mapped 163866 sequences
[M::worker_pipeline::1369.11634.05] mapped 160113 sequences
[M::worker_pipeline::1382.959
34.19] mapped 160627 sequences
[M::worker_pipeline::1396.19534.33] mapped 159259 sequences
[M::worker_pipeline::1410.708
34.46] mapped 158131 sequences
[M::worker_pipeline::1424.43734.59] mapped 157935 sequences
[M::worker_pipeline::1438.547
34.71] mapped 158613 sequences
[M::worker_pipeline::1453.87734.81] mapped 158987 sequences
[M::worker_pipeline::1467.334
34.93] mapped 158144 sequences
[M::worker_pipeline::1481.53835.06] mapped 161848 sequences
[M::worker_pipeline::1494.527
35.18] mapped 166190 sequences
[M::worker_pipeline::1506.871*35.22] mapped 159601 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -x ava-ont -t 48 iddm.fastq iddm.fastq

The Minimap script for the .sam file
echo "Input file: " $1
fastq=$1
minimap2 -t 48 -x ava-ont --dual=yes -a $fastq $fastq > $fastq.dual.sam

And the result:

Input file: iddm.fastq
[M::mm_idx_gen::101.9131.89] collected minimizers
[M::mm_idx_gen::110.741
3.71] sorted minimizers
[WARNING] For a multi-part index, no @sq lines will be outputted. Please use --split-prefix.
[M::main::110.7423.71] loaded/built the index for 414600 target sequence(s)
[M::mm_mapopt_update::114.441
3.63] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 414600
[M::mm_idx_stat::116.5383.58] distinct minimizers: 201503003 (31.48% are singletons); average occurrences: 6.778; average spacing: 2.930
[M::worker_pipeline::4249.334
37.26] mapped 39197 sequences
[M::worker_pipeline::8132.25836.54] mapped 39391 sequences
[M::worker_pipeline::12522.165
34.18] mapped 41758 sequences
[M::worker_pipeline::15639.89635.54] mapped 41173 sequences
[M::worker_pipeline::19105.065
36.24] mapped 38831 sequences
[M::worker_pipeline::22503.35536.59] mapped 31375 sequences
[M::worker_pipeline::26454.166
36.32] mapped 34915 sequences
[M::worker_pipeline::28441.04736.07] mapped 147452 sequences
[M::worker_pipeline::30503.552
35.34] mapped 163866 sequences
[M::worker_pipeline::31750.36735.54] mapped 160113 sequences
[M::worker_pipeline::33022.420
35.79] mapped 160627 sequences
[M::worker_pipeline::34290.97235.99] mapped 159259 sequences
[M::worker_pipeline::35530.109
36.12] mapped 158131 sequences
[M::worker_pipeline::36705.06936.31] mapped 157935 sequences
[M::worker_pipeline::37924.024
36.51] mapped 158613 sequences
[M::worker_pipeline::39391.11436.46] mapped 158987 sequences
[M::worker_pipeline::40467.808
36.69] mapped 158144 sequences
[M::worker_pipeline::41510.01236.87] mapped 161848 sequences
[M::worker_pipeline::42643.365
36.96] mapped 166190 sequences
[M::worker_pipeline::43728.49736.99] mapped 159601 sequences
[M::mm_idx_gen::43843.675
36.89] collected minimizers
[M::mm_idx_gen::43846.83036.89] sorted minimizers
[M::main::43846.830
36.89] loaded/built the index for 1277557 target sequence(s)
[M::mm_mapopt_update::43846.83036.89] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 1277557
[M::mm_idx_stat::43848.852
36.89] distinct minimizers: 206534157 (31.14% are singletons); average occurrences: 6.585; average spacing: 2.941
[M::worker_pipeline::45445.58937.01] mapped 39197 sequences
[M::worker_pipeline::46775.468
37.13] mapped 39391 sequences
[M::worker_pipeline::48483.44136.98] mapped 41758 sequences
[M::worker_pipeline::49759.823
37.15] mapped 41173 sequences
[M::worker_pipeline::51301.82137.17] mapped 38831 sequences
[M::worker_pipeline::52551.044
37.33] mapped 31375 sequences
[M::worker_pipeline::54093.70637.28] mapped 34915 sequences
[M::worker_pipeline::54944.790
37.39] mapped 147452 sequences
[M::worker_pipeline::55681.23037.52] mapped 163866 sequences
[M::worker_pipeline::56508.209
37.58] mapped 160113 sequences
[M::worker_pipeline::57399.81337.61] mapped 160627 sequences
[M::worker_pipeline::58213.000
37.67] mapped 159259 sequences
[M::worker_pipeline::59014.91237.72] mapped 158131 sequences
[M::worker_pipeline::59787.852
37.78] mapped 157935 sequences
[M::worker_pipeline::60531.22637.87] mapped 158613 sequences
[M::worker_pipeline::61327.121
37.92] mapped 158987 sequences
[M::worker_pipeline::62010.62038.02] mapped 158144 sequences
[M::worker_pipeline::62687.753
38.09] mapped 161848 sequences
[M::worker_pipeline::63383.41838.15] mapped 166190 sequences
[M::worker_pipeline::63976.318
38.21] mapped 159601 sequences
[M::mm_idx_gen::64036.48238.18] collected minimizers
[M::mm_idx_gen::64038.104
38.18] sorted minimizers
[M::main::64038.10438.18] loaded/built the index for 645249 target sequence(s)
[M::mm_mapopt_update::64038.104
38.18] mid_occ = 653
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 645249
[M::mm_idx_stat::64039.83738.18] distinct minimizers: 162491418 (39.95% are singletons); average occurrences: 4.072; average spacing: 2.942
[M::worker_pipeline::66075.369
38.23] mapped 39197 sequences
[M::worker_pipeline::67810.32738.30] mapped 39391 sequences
[M::worker_pipeline::69696.915
38.24] mapped 41758 sequences
[M::worker_pipeline::71261.79238.38] mapped 41173 sequences
[M::worker_pipeline::72985.904
38.45] mapped 38831 sequences
[M::worker_pipeline::74617.16638.54] mapped 31375 sequences
[M::worker_pipeline::76365.991
38.61] mapped 34915 sequences
[M::worker_pipeline::77291.26438.69] mapped 147452 sequences
[M::worker_pipeline::78097.938
38.77] mapped 163866 sequences
[M::worker_pipeline::78874.29638.85] mapped 160113 sequences
[M::worker_pipeline::79737.234
38.90] mapped 160627 sequences
[M::worker_pipeline::80548.22538.97] mapped 159259 sequences
[M::worker_pipeline::81334.744
39.02] mapped 158131 sequences
[M::worker_pipeline::82109.85339.07] mapped 157935 sequences
[M::worker_pipeline::82934.343
39.11] mapped 158613 sequences
[M::worker_pipeline::83724.46239.16] mapped 158987 sequences
[M::worker_pipeline::84487.132
39.22] mapped 158144 sequences
[M::worker_pipeline::85176.92039.28] mapped 161848 sequences
[M::worker_pipeline::85901.956
39.32] mapped 166190 sequences
[M::worker_pipeline::86616.706*39.33] mapped 159601 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -t 48 -x ava-ont --dual=yes -a iddm.fastq iddm.fastq
[M::main] Real time: 86618.862 sec; CPU: 3406730.278 sec; Peak RSS: 131.710 GB

I hope you can help me, and thanks a lot in advance.
Best regards
Fabian

Hello Fabian,
if you want to polish your contigs, you have to run minimap2 and racon with the following commands:

minimap2 -x map-ont -t 48 iddm.ctg.fa iddm.fastq > ovl.paf
racon -t 48 iddm.fastq ovl.paf iddm.ctg.fa > polished.ctg.fasta

If you want to error correct reads, run the following:

minimap2 -t 48 -ax ava-ont --dual=yes iddm.fastq iddm.fastq > dual.sam
racon -t 48 iddm.fastq dual.sam iddm.fastq > polished.reads.fasta

Sorry for the late reply!
Best regards,
Robert

Thanks a lot,
I just need the second(correct reads) and try it now. I'll let you know if it worked.

Best regards
Fabian

Thanks so much,
everything worked fine.

I seem to have the same error with ONT data

Error
[racon::Polisher::initialize] loaded target sequences 6.005934 s
[racon::Polisher::initialize] loaded sequences 759.764594 s
[racon::Polisher::initialize] loaded overlaps 780.298724 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

SAM line

@sq SN:contig_141_pilon LN:1358375
@sq SN:contig_142_pilon LN:67648
@sq SN:contig_143_pilon LN:5159672
@sq SN:contig_144_pilon LN:35530
@sq SN:contig_147_pilon LN:40799
@sq SN:contig_148_pilon LN:2807903
@sq SN:contig_149_pilon LN:47893
@sq SN:contig_15_pilon LN:7660417
@sq SN:contig_150_pilon LN:960
@sq SN:contig_151_pilon LN:3459730
@sq SN:contig_153_pilon LN:1283258
@sq SN:contig_154_pilon LN:9233410
@sq SN:contig_155_pilon LN:466968
@sq SN:contig_156_pilon LN:146389
@sq SN:contig_157_pilon LN:32373
@sq SN:contig_158_pilon LN:87654
@sq SN:contig_16_pilon LN:3679621
@sq SN:contig_160_pilon LN:525355
@sq SN:contig_165_pilon LN:42875

I ran this commandline
racon -m 8 -x -6 -g -8 -w 500 -t 30 ~20190404.fq ~/aln.sam ~/assembly.fasta > ~/racon1x.fa

Hello,
could you please provide the command you used to get the aln.sam file?

Best regards,
Robert

bwa index fasta
bwa mem bwa mem -t 36 -x path_to_index Path_to_long_read(Nanopore).fq > output.sam

Looks alright. Did you maybe put the wrong long read file or assembly file? The error you encountered indicates that Racon could not find either contigs of the assembly or any of the reads, or both. Can you please paste first line of 20190404.fq, first line of assembly.fasta and first line which does not start with @ from aln.sam?

I tried to polish my ONT miniasm assembly by Illumina reads 4 times, but I get error: "overlap is not transmuted" after loaded input files to racon

reads1="SRR6880005_1_fixed.fastq"
reads2="SRR6880005_2_fixed.fastq"
target="miniasm_coluzzii.fa"
cat ${reads1} ${reads2} > SRR6880005_12_fixed.fastq
total='SRR6880005_12_fixed.fastq'

for (( i=1; i <= 4; i++ ))
do
if [ -s bwa_coluzzii_${i}.sam ]
then
echo 'bwa is already done'
else
bwa index ${target}
bwa mem ${target} ${reads1} ${reads2} > bwa_coluzzii_${i}.sam
fi
align="bwa_coluzzii_${i}.sam"
racon  -f -t 46  ${total} ${align} ${target} >miniasm_coluzzi_polish_${i}.fa

target="miniasm_coluzzi_polish_${i}.fa"
done

Hello,
please paste the output of head -n 1 SRR6880005_*.fastq, head -n 1 miniasm_coluzzii.fa and grep "^[^@]" bwa_coluzzii_1.sam | head -n 1".

Best regards,
Robert

head -n 1 SRR6880005_*.fastq
==> SRR6880005_12_fixed.fastq <==
@SRR6880005.2/1

==> SRR6880005_1_fixed.fastq <==
@SRR6880005.2/1

==> SRR6880005_2_fixed.fastq <==
@SRR6880005.2/2

head -n 1 miniasm_coluzzii.fa

utg000001l

grep "^[^@]" bwa_coluzzii_1.sam | head -n 1
SRR6880005.2 99 utg000473l 94174 57 23M2D32M2I43M = 94174 100 GGTAAATTGAGTACCATTATCAGACACGAGAACTTCTGGCACTCCGAAAGTTGCGAAAATTTGTTTCAAAATTCTTATTGTTGTTCTCGCAGTTATTGAT AAAAFFJFJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJFAJAFJFJAFJJFAJJJFJJJ<AJ<JJJJJJJJFFFFA<FJJJJJ7JJJJJJJ<FJJJJJA NM:i:6 MD:Z:0A14G7^AC75 MC:Z:23M2D32M2I43M AS:i:76 XS:i:68 XA:Z:utg000444l,-99101,12S44M2D7M1D37M,4;utg000444l,+88866,34M2I26M3D38M,8;utg000162l,+139394,37M1I4M1I12M1D45M,6;

As it seems, BWA removed /1 and /2 from your sequence headers in the SAM file, which hinders Racon to connect the sequence and alignment files. Try renaming your sequences as SRR6880005.21 and SRR6880005.22. Afterwards, run BWA again.

Thanks for the quick reply!
Seems that nothing changes in SAM file:
SRR6880005.2 99 utg000473l 94174 57 23M2D32M2I43M = 94174 100 GGTAAATTGAGTACCATTATCAGACACGAGAACTTCTGGCACTCCGAAAGTTGCGAAAATTTGTTTCAAAATTCTTATTGTTGTTCTCGCAGTTATTGAT AAAAFFJFJJJJJJJJJJJJJJJJJJFJFFJJJJJJJJJFAJAFJFJAFJJFAJJJFJJJ<AJ<JJJJJJJJFFFFA<FJJJJJ7JJJJJJJ<FJJJJJA NM:i:6 MD:Z:0A14G7^AC75 MC:Z:23M2D32M2I43M AS:i:76 XS:i:68 XA:Z:utg000444l,-99101,12S44M2D7M1D37M,4;utg000444l,+88866,34M2I26M3D38M,8;utg000162l,+139394,37M1I4M1I12M1D45M,6;

Should I replace .2 and .1 in header names to /2 /1 ?
Best regards,
Dmitriy

i tried to use racon to polish my assembly but i got this error:

[racon::Polisher::initialize] loaded target sequences 9.923656 s
[racon::Polisher::initialize] loaded sequences 2401.086862 s
[racon::Polisher::initialize] loaded overlaps 1721.202367 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

the commands i used to get to this point are:

time minimap2 -t 144 -ax map-ont Assembly.fasta bonito_basecall.fastq > bonito.sam
time racon -t 144 bonito_basecall.fastq bonito.sam Assembly.fasta > bonito.racon.fasta

i don't think that i made a mistake in the order of the arguments as i have used the exact same command for other samples. i just seem to only get this error for one of them.

The commands look fine. Which Racon version are you using?

hi @rvaser, i'm using version 1.4.3. also i made an error in my previous post.

if i use .sam in my command time racon -t 144 bonito_basecall.fastq bonito.sam Assembly.fasta > bonito.racon.fasta, my error message is:

[racon::Polisher::initialize] loaded target sequences 9.859341 s
[racon::Polisher::initialize] loaded sequences 2319.713921 s
terminate called after throwing an instance of 'std::invalid_argument'
what(): [bioparser::SamParser] error: invalid file format!

if i use .paf in my command time racon -t 144 bonito_basecall.fastq bonito.paf Assembly.fasta > bonito.racon.fasta, my error message is:

[racon::Polisher::initialize] loaded target sequences 9.923656 s
[racon::Polisher::initialize] loaded sequences 2401.086862 s
[racon::Polisher::initialize] loaded overlaps 1721.202367 s
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!
[racon::Overlap::find_breaking_points] error: overlap is not transmuted!

i did it both ways as i had seen in a previous comment to mv .sam to .paf so thought it might work.

Can you please try the latest version (1.4.20)?

hi @rvaser i updated racon to version 1.4.22.

with bonito.sam in my command, the error message is:

[racon::Polisher::initialize] loaded target sequences 9.443693 s
[racon::Polisher::initialize] loaded sequences 2362.990930 s
terminate called after throwing an instance of 'std::invalid_argument'
what(): [bioparser::SamParser] error: invalid file format

with bonito.paf in my command, the error message is:

[racon::Polisher::initialize] loaded target sequences 9.365726 s
[racon::Polisher::initialize] loaded sequences 2395.195146 s
[racon::Overlap::transmute] error: unequal lengths in sequence and overlap file for sequence 720cf50f-6b57-42f7-9093-1ca627cf2077!

i did not have any errors when running minimap2 so not quite sure why there would be unequal lengths.

Sorry for my late reply. If you generate a .sam file with minimap2 (using option -a) you cannot just do mv .sam .paf, run minimap2 without -a. Maybe your .sam file is truncated or something. Please try generating the .paf file and try again. If you can, you can also send me the data so I can investigate locally.