mummer4/mummer

More than one alignment on the same path

rsharris opened this issue · 0 comments

(FWIW, I'm the author of lastz)

I'm new to mummer (am running v4.0.0rc1) and tried it out on a small randomly-generated single chromosome genome. Where I was expecting either one long alignment or a few shorter non-overlapping alignments, I instead got two alignments that have a long path in common. Is this intended?

The two sequences can be found at
https://docs.google.com/document/d/1HUW0ocUHytplvUMb9BabDHFr4AJoELvQbT1bpB1FRKc
One is a random sequence, the other was created by simulating substitutions and indels (no rearrangements or duplications). Identity between these sequences should be ≈ 95%.

I ran

nucmer -p orange1_onto_apple1 apple1.fa orange1.fa
show-coords orange1_onto_apple1.delta > orange1_onto_apple1.coords
show-aligns orange1_onto_apple1.delta APPLE1 ORANGE1 > orange1_onto_apple1.aligns

My expectation was that either the entirety of the two sequences would be reported as a single alignment. Or, failing that, that I'd see that split into a few smaller non-overlapping alignments.

Instead, what I see is two alignments. The first covers the entirety of the two sequences as I expected. The second begins around 2.7K (following a couple nearby deletions in ORANGE1) and traverse what appears to be the same alignment path as the other alignment. So two alignments with about 95Kbp in common.

That seems like strange behavior for an aligner. Did I do something wrong? Or am I mis-interpreting the output? Is 5% divergence too high?

My initial test actually had a genome with 5 chromosome pairs, and I had similar problem on 3 chromosomes. On one chromosome it split into two non-overlapping alignments (probably reasonable). On the remaining chromosome it had two alignments but they appear to overlap by about 40% (though I have not checked whether that is true base-by-base).