xavierdidelot/ClonalFrameML

Correlate positions between input alignment/graphical output

Closed this issue · 1 comments

Hi,

I have generated the graphical output for my clonalframeML results using the R script provided. But now I wonder, how can I correlate a specific position in the plot with my input alignment? I am using a xmfa file as input (generated with progressive mauve alignment of a number of genomes, then applying stripSubsetLCBs 500) in which I have the genomic positions of the regions included. So I know what genes are included, and would like to mark some of these in the graphical output plot. I have checked the position_cross_reference.txt but I do not understand how this file is organized. Is there one row per node as ordered in the ML_sequence.fasta file? I have tried to open/import the position_cross_reference.txt in excel to see if the format is more clear, but I cannot find a delimiter or so that organize the data in this way. Also, if I want to find the position of a gene containing non-unique patterns, how do I do that? And finally, I am not that used to work with xmfa files, but if I am supposed to look for position 10 in the xmfa file, where is that? Is the numbering just starting in the first block and going through all blocks, or is the numbering according to genomes?

Many questions, sorry for that but would really appreciate some input on this. I got a very nice plot, now I would just like to point out certain gene locations/specific features in it.

/Julia

It is hard to see visually exactly where the recombination events are on the plot produced by the R script. But what this R script does is to show the position of recombination events as contained in the file ending with .importation_status.txt. So if you look in this file, you will know exactly where the events start and end.

To make the link between the ClonalFrameML positions and the positions in your input XMFA file, you need to bear in mind that ClonalFrameML inserts buffers of 1000bp between the different XMFA alignment regions. For example if your XMFA contains a gene of length A and then a gene of length B, then the ClonalFrameML positions 1 to A correspond to the positions in the first gene, and the positions A+1001 to A+1000+B correspond to the second gene, etc. See also other previous issues on XMFA.