marbl/CHM13

SF3B3 gene annotation missing in gff3

fbrundu opened this issue · 2 comments

I am interested on the SF3B3 gene, and I was able to find the entry in the gff3 file for hg38 (e.g., gencode 43):

chr16	HAVANA	gene	70523791	70577670	.	+	.	ID=ENSG00000189091.13;gene_id=ENSG00000189091.13;gene_type=protein_coding;gene_name=SF3B3;level=1;hgnc_id=HGNC:10770;tag=ncRNA_host;havana_gene=OTTHUMG00000137582.8

However, I cannot find the entry with type "gene" (on third column) for such gene in the gff3 posted on this repository. Other annotations for this gene, e.g., "transcript", are available:

CDS
exon
intron
start_codon
stop_codon
transcript

What could be the reason for this, and would there be a work-around to this issue?

It is unclear why a gene record did not get written for this. An issue was created for CAT, however,
the software is between maintainers. You could generate the gene records yourself. Also, Ensembl, has chm13 annotations in their HPRC release. NCBI has also released CHM13 annotations

ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit#286

Hello, just to add on this.

SF3B3 is found in the curated RefSeq/Liftoff annotation as following:

chr16   Liftoff gene    76334997        76388857        .       +       .       ID=SF3B3;gene_name=SF3B3;db_xref=MIM:605592;description=splicing factor 3b subunit 3;gbkey=Gene;gene=SF3B3;gene_biotype=protein_coding;gene_synonym=STAF130;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SF3B3_0