ogotoh/spaln

Any ways to avoid or fix stop codon in spaln out put?

xiekunwhy opened this issue · 2 comments

Hi,

I found many stop codon in many gene in spaln out put, is this phenomenon normally? And how to avoid or fix if it is abnormal?
command line: spaln -t6 -M4 -Q7 -O0 -LS -ya2
the peptide sequence are as following
spaln.protein.pep.fa.gz

Best,
Kun

Dear Kun,

Your command line does not show what is the genomic sequence (specified by -d option) and what are queries (specified by the argument). Without these lines of information, I cannot help you much. General suggestions are as follows.

  1. Confirm that your genomic sequence is formatted accordingly. In particular, set -XG option if your sequence contains only partial genome.
  2. Preferably, set proper -T option so that optimal species-specific parameter values should be used.
  3. Some species use non-standard genetic code. In such a case, set proper -C option.
  4. If you want to avoid excessive termination codon, set -yoN (e.g. N = 100) option, where N (default = 30) specifies the penalty to a premature termination codon. However, you must be careful, because this is usually accompanied with excessive gaps including frame shifts.

By the way, how did you obtained spaln.protein.pep.fa? Does each dot correspond to a termination codon? If you did not so, try -O7 option. The output is not a genuine FASTA format, but you may use ‘grep -v ‘^;’ to deplete additional lines.

Osamu,

Hi Osamu,

Here are all commands I used (Op-f.gf is the assembly I used, Opf.homolog.tab.best.faa is a subset protein sequences from orthodb10), and the species is Oplegnathus punctatus.

makeidx.pl -inp Op-f.gf
spaln -t6 -M4 -Q7 -O0 -LS -ya2 -o Op-f.protein.gff3 -d Op-f Opf.homolog.tab.best.faa
gffread Op-f.protein.gff3 -g Op-f.gf -x spaln.protein.cds.fa -y spaln.protein.pep.fa

I will try -yoN and -O7

The dots in protein file are correspond to a termination codon, for example, the 10th codon of mRNA13344

mRNA13344 gene=scaffold_1_667
ATGTGCAGCCAGGTGAGCCTGCTGCAGTGACGCTGTCTGTGTTGACTCCACAGATGCAGACGGTCACTCT
GATTCCCGGGGACGGGATTGGACCAGAGATCTCCACTGCTGTCATGAAGATCTTTGAGGCTGCAAAGGTG
AGTGTGATCCGTTTGTTTCTTCATCTTTGTGAGTATCTGTTTGAAAGTGTAGATTTCACCTGCAGGCTCC
GATCAGCTGGGAGGAGAGGAATGTGACGGCCATAAAGGGACCCGGTGGCCGGTGGATGATCCCCCCTGAT
GCTAAAGAGTCCATGGACAAGAGCAAGATCGGACTGAAAGGACCCCTGAAGACCCCCATCGCCGCAGGTC
ACCCCTCCATGAACCTGCTGCTGAGGAAGACCTTTGACCTTTACGCCAACGTGCGACCCTGCGTCTCTAT
CGAGGGCTACAAGACTCCGTACACCGACGTCAACCTGGTCACCATCCGCGAGAACACGGAGGGCGAGTAC
AGCGGCATCGAACACGTGAGTCATTAGAGCCTCGTCCTGCTGCTGGAGCACAAACACCTGGAACGAGTCA
CGTTATCGACCATCAGAAAGTCCAGCAGCTGTTTGTTAGTCCTGTCAGCTAGCGGCTGCAGACAGGACGC
TCTGCTCCTGCTCGTCTTCAGGATCGTCGACGGCGTCGTTCAGAGCATCAAACTGATCACTGAGGACGCC

Best,
Kun