schneebergerlab/plotsr

Chromosome ID not present in genome fasta

adriandrr opened this issue · 2 comments

Hey, I am currently trying to run your syntheny plot pipeline. I would be really glad if you could help me.
 
I am trying to create the syntheny between 12 bacteria de-novo generated whole genomes.
 
I mapped them with minimap A+B, B+C, C+D and so on. I used a bash script for that if you want to have a look:
 
#!/bin/bash
 
fasta_files=(0.fa 1.fa 2.fa 3.fa 4.fa 5.fa 6.fa 7.fa 9.fa 10.fa 11.fa 12.fa 13.fa)
 
for ((i = 0; i < ${#fasta_files[@]} - 1; i++)); do
    current_file="${fasta_files[$i]}"
    next_file="${fasta_files[$i + 1]}"
    current_prefix="${current_file%.}"
    next_prefix="${next_file%.
}"
    output_bam="${current_prefix}${next_prefix}.bam"
   
    minimap2 -ax asm5 -t 4 --eqx "$current_file" "$next_file" | samtools sort -O BAM - > "$output_bam"
    samtools index "$output_bam"
done
 
After that I used a bash oneliner for-loop to produce the syri information:
 
for i in $(ls bam -1v); do prefix="${i%.}";IFS="
" read -r fnum snum <<< "$prefix"; syri -c $i -r $fnum.fa -q $snum.fa -F B --prefix $prefix ;done
 
I am unsure if there is a problem with the syri information since I am not very familiar with that. The first 5 lines of the first syri output  "0_1syri.out" look like this:
 
Chr0    1          4344    -           -           -           -           -           NOTAL1           -           NOTAL -
Chr0    4345    5189076          -           -           Chr0    1          5020588          SYN1    -           SYN      -
Chr0    4345    4896    -           -           Chr0    1          553      SYNAL1            SYN1    SYNAL  -
Chr0    4484    4484    G         T          Chr0    140      140      SNP543           SYN1    SNP      -
Chr0    4485    4485    A          T          Chr0    141      141      SNP544           SYN1    SNP      -
 
what I now tried is to start plotsr with this command:
 
plotsr --sr 0_1syri.out --sr 1_2syri.out --sr 2_3syri.out --sr 3_4syri.out --sr 4_5syri.out --sr 5_6syri.out --sr 6_7syri.out --sr 7_9syri.out --sr 9_10syri.out --sr 10_11syri.out --sr 11_12syri.out --sr 12_13syri.out --genomes ../../genomes2.txt -o output_plot.png
 
First I wanted to use the main fasta files as input whereas the genomes2.txt file looked like that:
 
#file     name   tags
0.fa      0          lw:1.5
1.fa      1          lw:1.5
10.fa    10        lw:1.5
11.fa    11        lw:1.5
12.fa    12        lw:1.5
13.fa    13        lw:1.5
2.fa      2          lw:1.5
3.fa      3          lw:1.5
4.fa      4          lw:1.5
5.fa      5          lw:1.5
6.fa      6          lw:1.5
7.fa      7          lw:1.5
9.fa      9          lw:1.5
 
and I ran into the error:
ImportError: For chromosome ID: Chr0, length in genome fasta: genomes2.txt is less than the maximum coordinate in the structural annotation file: 1_2syri.out. Exiting.
 
I didn't understand the error. The first fasta file is the reference and therefore the longest. I don't really see maximum coordinate problems. Anyway, I saw that there was the possibility of using the chromosome lengths as input. So I calculated the length of each used fasta file and produced a chrlen file. Ofc I renamed the input files in genomes2.txt from .fa to .chrlen. The chrlen files look like this
 
"0.chrlen":
Chr0    5199559
 
"1.chrlen":
Chr0    5020588
 
and so on...
 
With that and the same plotsr command to start I run into the error:
 
ImportError: Chromosome ID: Chr0 in structural annotation file: 0_1syri.out not present in genome fasta: 0. Exiting
 
Could you explain to me, what I am doing wrong?
Thanks!
 
P.S.: thanks for reading until here. I think I found an error in your example with the current explanation in the README file. The chosen fonts in the example files markers.bed and tracks.txt are Arial. I think this is not supported anymore (?). Anyway, I changed it to DejaVu Sans and it worked again. Thought you might know :)

Update: i ordered the genomes.txt file numerically so that it is

#file name tags
0.fa 0 lw:1.5
1.fa 1 lw:1.5
2.fa 2 lw:1.5
...

with the chrlen files I still run into the same error as before, but with the fast files it actually worked!!

For future reference: genomes.txt requires genomes to be in same order in which they are analysed. Also, to use chrlen files, use ft:cl tag.