marbl/MashMap

Ordering output with reference sequences

aaronphillips7493 opened this issue · 3 comments

Hey, thank you for this wonderful package! It has saved many headaches.

I am wondering, how does MashMap decide on the ordering of the reference sequences (and, in turn the order of the query)? I am mapping a plant genome assembly (~1956 contigs) to a reference assembly of a closely related species with 12 chromosomes. The output I get is not in the same order that the 12 chromosomes appear in the reference .fasta file - they are all jumbled up (i.e. X-axis reads: Chromosome 1 -> 3 -> 2 -> 4 -> 6 -> 5 -> 7 -> 11...etc.).

I was wondering if you know of a way to force MashMap to output a line that is arranged such that the reference chromosomes appear in order? I have attached the output I currently have to this post.
oryza_hypo_contigs_vs_nipp_200717_8 copy.pdf

I don’t think there should be any order to the contigs (I.e. order to the Y axis) between chromosomes, so I don’t know why MashMap has decided on the ordering. Does that make sense?

Thank you in advance,
Aaron :)

Hi, are you referring to ordering in txt output or the dot plot?
Assuming you are referring to the dot plot, the sequences are arranged to get a nearly linear plot (logic borrowed from mummerplot). If you prefer to modify it, you can try to edit the intermediate gnuplot files, it should be relatively easy to understand those files.

I'm referring to the dot plot. Given that I have 1956 contigs, I think rearranging the gnuplot files would prove to be quite a task. I tried the alignment using Assemblytics and the output is ordered in the correct way - i.e. by the reference chromosome number (attached here). There's no way to easily emulate that here?
Screen Shot 2020-07-28 at 11 20 22 am

Assuming you are using the generateDotplot script in mashmap, can you try setting the variable OPT_layout from 1 to 0?

my $OPT_layout = 1;

Also, I may be able to look better if you can share the mashmap's output, reference and assembly?