agshumate/Liftoff

trans-spliced mRNA in chloroplast genome

Closed this issue · 2 comments

Thank you for developing this wonderful tool. I usually use liftoff to transfer annotations for eukaryotic nuclear genomes and always get nice results.
I'd like to report an issue while lifting over a trans-spliced mRNA in a chloroplast genome.

Following is the gene structure in problem in the reference genome.

Chloroplast	feature	gene	1	66800	.	+	.	ID=MpKit2_Cp001;locus_tag=MpKit2_Cp001
Chloroplast	feature	mRNA	1	66800	.	+	.	ID=MpKit2_Cp001.1;Parent=MpKit2_Cp001;codon_start=1;gene=rps12;locus_tag=MpKit2_Cp001;note=30S ribosomal protein S12%3B rps12 CDS%3B trans splicing of rps12 intron 1;product=ribosomal protein S12;protein_id=BBD75102.1;trans_splicing=;transl_table=11;translation=MPTIQQLIRNKRQPIENRTKSPALKGCPQRRGVCTRVYTTTPKKPNSALRKIARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYHIIRGTLDAVGVKDRQQGRSKYGVKKSK
Chloroplast	feature	intron	65901	66686	.	-	.	ID=MpKit2_Cp001.1.intron1;Parent=MpKit2_Cp001.1
Chloroplast	feature	CDS	66687	66800	.	-	0	ID=MpKit2_Cp001.1.cds1;Parent=MpKit2_Cp001.1
Chloroplast	feature	intron	1	92	.	+	.	ID=MpKit2_Cp001.1.intron2;Parent=MpKit2_Cp001.1
Chloroplast	feature	CDS	93	324	.	+	0	ID=MpKit2_Cp001.1.cds2;Parent=MpKit2_Cp001.1
Chloroplast	feature	intron	325	828	.	+	.	ID=MpKit2_Cp001.1.intron3;Parent=MpKit2_Cp001.1
Chloroplast	feature	CDS	829	854	.	+	0	ID=MpKit2_Cp001.1.cds3;Parent=MpKit2_Cp001.1

Here, the first exon is encoded in the - strand, while other 2 exons are encoded in the + strand.

When this gene was lifted-over using liftoff, all the exons were encoded in the + strand, and the exons were ordered by the position in the genome, which was different from the original order in the reference.

CP	Liftoff	gene	1	66802	.	+	.	ID=MpKit2_Cp001;locus_tag=MpKit2_Cp001;coverage=1.0;sequence_ID=1.0;valid_ORFs=0;extra_copy_number=0;copy_num_ID=MpKit2_Cp001_0
CP	Liftoff	mRNA	1	66802	.	+	.	ID=MpKit2_Cp001.1;Parent=MpKit2_Cp001;codon_start=1;gene=rps12;locus_tag=MpKit2_Cp001;note=30S ribosomal protein S12; rps12 CDS; trans splicing of rps12 intron 1;product=ribosomal protein S12;protein_id=BBD75102.1;trans_splicing=;transl_table=11;translation=MPTIQQLIRNKRQPIENRTKSPALKGCPQRRGVCTRVYTTTPKKPNSALRKIARVRLTSGFEITAYIPGIGHNLQEHSVVLVRGGRVKDLPGVRYHIIRGTLDAVGVKDRQQGRSKYGVKKSK;matches_ref_protein=False;valid_ORF=False;missing_start_codon=True;extra_copy_number=0
CP	Liftoff	CDS	93	324	.	+	.	ID=MpKit2_Cp001.1.cds2;Parent=MpKit2_Cp001.1;extra_copy_number=0
CP	Liftoff	CDS	829	854	.	+	.	ID=MpKit2_Cp001.1.cds3;Parent=MpKit2_Cp001.1;extra_copy_number=0
CP	Liftoff	CDS	66689	66802	.	+	.	ID=MpKit2_Cp001.1.cds1;Parent=MpKit2_Cp001.1;extra_copy_number=0
CP	Liftoff	intron	1	92	.	+	.	ID=MpKit2_Cp001.1.intron2;Parent=MpKit2_Cp001.1;extra_copy_number=0
CP	Liftoff	intron	325	828	.	+	.	ID=MpKit2_Cp001.1.intron3;Parent=MpKit2_Cp001.1;extra_copy_number=0
CP	Liftoff	intron	65903	66688	.	+	.	ID=MpKit2_Cp001.1.intron1;Parent=MpKit2_Cp001.1;extra_copy_number=0

Although trans-spliced genes are not so common in nuclear genomes, they are frequently observed in organellar genomes. Hopefully, liftoff can handle trans-splicing in the future version.

I used liftoff ver. 1.6.1.
To reproduce this, the FASTA and GFF files available from here can be used.
(The sequence name in FASTA needs to be changed to 'Chloroplast')

Hi,
Currently liftoff is really not designed to handle trans-splicing. This issue is that because the 'gene' and 'mRNA' features are annotated on the + strand, liftoff expects the child features to be on the + strand too. One work around would be to remove the Parent= field from the introns and CDSs and the include 'intron' and 'CDS' in a text file with -features. This will make it so that each of these features is lifted over independently which will allow them to map to different strands.

Thank you for the suggestion. I tried the workaround, removing Parent field and specifying an "-f" option (Does "-features" mean "-f"?).
But it failed with an error like this.

 File "path/to/liftoff/polish.py", line 121, in group_cds_by_tran
    groups[cds.attributes["Parent"][0]].append(cds)
KeyError: 'Parent'

So removing "Parent" field does not seem to work.

Anyway, I was able to transfer the annotation by splitting gene and mRNA depending on which strand they are encoded.