schneebergerlab/plotsr

A running problem about two genomes with relatively large size differences (~850M vs ~650M)

Wenwen012345 opened this issue · 3 comments

Dear @mnshgl0110
Hello, this is a great tool, well suited to our needs. The species I am studying is Rhododendron. I have now finished sequencing the genome of my study rhododendron species (Rhododendron bailiense, Rb) in preparation for inter-species covariance analysis.

However, I have found that it seems that because the Rhododendron species I am studying (Rb) has a very large transposon insertion (repeat sequences), its genome size appears to be much larger than that of the other reference Rhododendron species in the same genus (and subgenus) (e.g., Rhododendron griersonianum, Rg). Their comparative genome sizes are ~850M (Rb) vs ~650M (Rg).

Of course, this is not necessarily a problem, but when I run the syri-plotsr process, the following error is reported when I get to the plotsr step. It seems that this may be caused by a large difference in chromosome length.

(syri) w@w:~/w2/HIFI1/juicer/3d-dna-review1-3/syri/Rb_Rg$ plotsr --sr  syri.out --genomes  genomes1.txt -H 8 -W 5

2023-06-18 21:03:21,979 - Plotsr - INFO - Starting

Traceback (most recent call last):
  File "/home/w/miniconda3/envs/syri/bin/plotsr", line 6, in <module>
    main(sys.argv[1:])
  File "/home/w/miniconda3/envs/syri/lib/python3.9/site-packages/plotsr/scripts/plotsr.py", line 334, in main
    plotsr(args)
  File "/home/w/miniconda3/envs/syri/lib/python3.9/site-packages/plotsr/scripts/plotsr.py", line 171, in plotsr
    chrlengths, genomes = validalign2fasta(alignments, args.genomes.name)
  File "/home/w/miniconda3/envs/syri/lib/python3.9/site-packages/plotsr/scripts/func.py", line 1050, in validalign2fasta
    raise ImportError(errmess2.format(c, os.path.basename(genf), als[i][0]))
ImportError: For chromosome ID: Chr10, length in genome fasta: genomes1.txt is less than the maximum coordinate in the structural annotation file: syri.out. Exiting.

Here are the relevant commands for my syri, minimap and plotsr etc:

minimap2 -ax asm5 --eqx Rg1_13.mod.fa Rb1_13.fa -t 40  > out1.sam 

syri  -c out1.sam -r Rg1_13.mod1.fa -q Rb1_13.fa -k -F S  --nc 40

plotsr --sr  syri.out --genomes  genomes1.txt -H 8 -W 5

syri_log_file
syri.log

genome.txt_file
genomes1.txt

syri.summary
syri.summary.txt

I am eager to know what is going on and how to fix it? Thank you very much!

In the genomes.txt, the order of genomes is wrong. Reference genome (Rg) should come first and then query genome (Rb).

Hello @mnshgl0110
Thank you for your reply so much. I got that. Are you avaliable now? By the way another question for the software is about the acquisition of reverse complementary sequences. Because I have the relevant hints, but I am afraid that there is a formatting error (I am not sure).

The relevant hint is:

Reading Coords - WARNING - Reference chromosome Chr11 has high fraction of inverted alignments with its homologous chromosome in the query genome (Chr11). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors.
Reading Coords - WARNING - Reference chromosome Chr12 has high fraction of inverted alignments with its homologous chromosome in the query genome (Chr12). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors.
Reading Coords - WARNING - Reference chromosome Chr4 has high fraction of inverted alignments with its homologous chromosome in the query genome (Chr4). Ensure that same chromosome-strands are being compared in the two genomes, as different strand can result in unexpected errors.
Reading Coords - WARNING - Reference chromosome Chr7 has high fraction of inverted alignments with its homologous chromosome in the query genome (Chr7). Ensure that same chromos

After filling in a text file with the name of the chromosome that should be changed to the another strand; the way I get the reverse complementary sequence is:

 revseq -sequence ./Rb1_13.fa -outseq  Rb1_13.rev.fa 
 seqkit grep -f ID7.txt  Rb1_13.fa -o 7.Rb.fa
 seqkit grep -f id7.rev.txt  Rb1_13.rev.fa  -o 7.Rb.rev.fa
cat 7.Rb.fa  7.Rb.rev.fa > Rb1-13.mod.fa

I'm not quite sure if this is a reasonable procedure?

I tried this process and it should work. Failed twice before because of some negligence....

@mnshgl0110