agshumate/Liftoff

too many * (stop codons)

Closed this issue · 10 comments

There are too many * (stop codons) in the protein sequence,after the gff file is converted into a protein file.
the target : Tanichthys albonubes ( Cyprinidae)
the refer:Danio rerio ( Cyprinidae)

likely the target assembly has an insertion or deletion within the coding part of the transcript leading to this frameshift. to get a better idea, check out the .sam alignments in the intermediate_files directory. Also it is worth confirming that the original reference transcript does not have internal stop codons as well. While this should not be the case, it does happen occasionally.

Hi, I have had the same problem.
I found that the stop codon did not appear in the original sequence but the protein file generated through the GFF file.
How can I fix this problem.

Hi,
are you certain that your protein in your target genome does not have a frameshift or point mutation leading to a premature stop?

Yes. I am sure. I checked the genomme sequencing reads.

did you use the -polish option?

did you use the -polish option?

Yes, I used it.

can you elaborate on what you mean when you checked the genome sequencing reads? do you mean the reads aligned to your new genome?

Hello,
Sorry for my poor english.

I used the Liftoff to annotate my new genome using its closest relatives genome with high-quality. But, I found that there are a lot stop codon.

Next, I used the blast method to search the same coding sequence in my new genome using the sequence of closest relatives as query. Howerver, it is not have stop codon.

Final, I checked the original sequencing data 150 PE reads (not assembly) of my genome. The results is same as blast results,no stop codon in genome.

I understand. how divergent is the closest relative you used for the lift-over? liftoff is only meant to be used between assemblies of the same species or very close relative (ie human and chimp).

About 20 MYA (million years ago)