agshumate/Liftoff

Transcripts from the same gene split across contigs/chromosomes.

Closed this issue · 2 comments

Hi,
I have noticed something a bit odd in the Liftoff output and wanted to ask your opinion.
After running Liftoff some of the transcripts from the same gene get placed on different contigs/chromosomes.
Intuitively this was not expected behaviour for me. Do you know why this might happen?

Input gff entry for gene:

ptg006428l      AUGUSTUS        mRNA    99594   100692  .       +       .       ID=jg99407.t2;geneID=jg99407
ptg006428l      AUGUSTUS        exon    99594   99909   .       +       .       Parent=jg99407.t2
ptg006428l      AUGUSTUS        exon    100259  100692  .       +       .       Parent=jg99407.t2
ptg006428l      AUGUSTUS        CDS     99594   99909   .       +       0       Parent=jg99407.t2
ptg006428l      AUGUSTUS        CDS     100259  100692  .       +       2       Parent=jg99407.t2
ptg006428l      AUGUSTUS        mRNA    100249  100692  .       +       .       ID=jg99407.t1;geneID=jg99407
ptg006428l      AUGUSTUS        exon    100249  100692  .       +       .       Parent=jg99407.t1
ptg006428l      AUGUSTUS        CDS     100249  100692  .       +       0       Parent=jg99407.t1

Liftoff output (the two transcripts got placed on different contigs):

contig_8718_RagTag      Liftoff mRNA    85979   86422   .       +       .       ID=jg99407.t1;geneID=jg99407;coverage=1.0;sequence_ID=1.0;matches_ref_protein=True;valid_ORF=True;valid_ORFs=1;extra_copy_number=0;copy_num_ID=jg99407.t1_0
contig_8718_RagTag      Liftoff exon    85979   86422   .       +       .       ID=exon_116493;Parent=jg99407.t1;extra_copy_number=0
contig_8718_RagTag      Liftoff CDS     85979   86422   .       +       .       ID=CDS_116451;Parent=jg99407.t1;extra_copy_number=0
ptg020894l_RagTag       Liftoff mRNA    83591   84689   .       +       .       ID=jg99407.t2;geneID=jg99407;coverage=1.0;sequence_ID=1.0;matches_ref_protein=True;valid_ORF=True;valid_ORFs=1;extra_copy_number=0;copy_num_ID=jg99407.t2_0
ptg020894l_RagTag       Liftoff exon    83591   83906   .       +       .       ID=exon_116491;Parent=jg99407.t2;extra_copy_number=0
ptg020894l_RagTag       Liftoff exon    84256   84689   .       +       .       ID=exon_116492;Parent=jg99407.t2;extra_copy_number=0
ptg020894l_RagTag       Liftoff CDS     83591   83906   .       +       .       ID=CDS_116449;Parent=jg99407.t2;extra_copy_number=0
ptg020894l_RagTag       Liftoff CDS     84256   84689   .       +       .       ID=CDS_116450;Parent=jg99407.t2;extra_copy_number=0

Additional info that might be relevant:

liftoff --version
v1.6.1

Command (presence of -infer_genes does not seem to make a difference):
liftoff -g Annotatios_v1.0.mrna.gff -o Annotatios_v1.0.liftoff.gff -infer_genes -p 30 -u unmapped_features_contigs.txt -f features.txt ragtag.scaffold.fmt.fasta asm.bp.p_ctg.fa

Genome size: >10Gb, larger than the 4Gb cutoff for minimap2

Thanks for your help and a great tool!

Agnieszka

Just noticed that it might be due to my gff having geneID= instead of Parent= for mRNA features.
Started a new run to check if replacing geneID= with of Parent= helps.

Just an update. It does look like it was a gff format issue (geneID was not recognized).
Reformatting the file to this (the gene entry was added with proper parent/child relationships):

ptg006428l      AUGUSTUS        gene    99594   100692  .       +       .       ID=jg99407
ptg006428l      AUGUSTUS        mRNA    99594   100692  .       +       .       ID=jg99407.t2;Parent=jg99407
ptg006428l      AUGUSTUS        exon    99594   99909   .       +       .       Parent=jg99407.t2
ptg006428l      AUGUSTUS        exon    100259  100692  .       +       .       Parent=jg99407.t2
ptg006428l      AUGUSTUS        CDS     99594   99909   .       +       0       Parent=jg99407.t2
ptg006428l      AUGUSTUS        CDS     100259  100692  .       +       2       Parent=jg99407.t2
ptg006428l      AUGUSTUS        mRNA    100249  100692  .       +       .       ID=jg99407.t1;Parent=jg99407
ptg006428l      AUGUSTUS        exon    100249  100692  .       +       .       Parent=jg99407.t1
ptg006428l      AUGUSTUS        CDS     100249  100692  .       +       0       Parent=jg99407.t1

Seems to have eliminated the issue of having genes split across contigs.
New Liftoff output:

ptg018754l_RagTag       Liftoff gene    79322   80420   .       -       .       ID=jg99407;coverage=1.0;sequence_ID=1.0;valid_ORFs=2;extra_copy_number=0;copy_num_ID=jg99407_0
ptg018754l_RagTag       Liftoff mRNA    79322   79765   .       -       .       ID=jg99407.t1;Parent=jg99407;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
ptg018754l_RagTag       Liftoff exon    79322   79765   .       -       .       ID=exon_116493;Parent=jg99407.t1;extra_copy_number=0
ptg018754l_RagTag       Liftoff CDS     79322   79765   .       -       .       ID=CDS_116451;Parent=jg99407.t1;extra_copy_number=0
ptg018754l_RagTag       Liftoff mRNA    79322   80420   .       -       .       ID=jg99407.t2;Parent=jg99407;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
ptg018754l_RagTag       Liftoff exon    79322   79755   .       -       .       ID=exon_116492;Parent=jg99407.t2;extra_copy_number=0
ptg018754l_RagTag       Liftoff exon    80105   80420   .       -       .       ID=exon_116491;Parent=jg99407.t2;extra_copy_number=0
ptg018754l_RagTag       Liftoff CDS     79322   79755   .       -       .       ID=CDS_116450;Parent=jg99407.t2;extra_copy_number=0
ptg018754l_RagTag       Liftoff CDS     80105   80420   .       -       .       ID=CDS_116449;Parent=jg99407.t2;extra_copy_number=0

Liftoff command:

liftoff -g test.gff -o test.liftoff.gff -p 50 -u unmapped_features_contigs.txt -f features.txt ragtag.scaffold.fmt.fasta asm.bp.p_ctg.fa