marbl/CHM13

GFF version 2.0 lacking mitochondrial encoded genes

bernardo-heberle opened this issue · 2 comments

Hello,

The GFF version 2.0 seems to be lacking mitochondrial encoded genes. Will you be releasing a new annotation including those?

Alternatively, would it be safe to assume that the coordinates for MT genes have not changed? If so I could copy them from the GRCh38 reference annotation in the meantime.

Thanks,
Bernardo

I am not sure which analysis fasta you're referring to. The browse should show mito gene annotations: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_3267197_GCA_009914755.4&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=CP068254.1%3A1%2D16569&hgsid=1321956005_Mx5hsmRrYb58UD2FZoRIajG15Y5K and I could grep the chromosome from the gff.gz file as well.

All the analysis sets except https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz use the CHM13 mito whose coordinates are NOT conserved versus GRCh38 so you cannot use those annotations. If you use the rCRS mito version then you should be able to re-use annotations from that version of the mito which I believe is in GRCh38 as well but I am not sure.

I am referring to the GFF annotation file [Assembly v2.0] provided in ENSEMBL: ftp.ebi.ac.uk/pub/databases/ensembl/hprc/y1_freeze/GCA_009914755.4/GCA_009914755.4_genes.gff3.gz

However, as you pointed out, the CAT and Liftoff annotation provided in: http://courtyard.gi.ucsc.edu/~mhauknes/T2T/t2t_Y/annotation_set/CHM13.v2.0.gff3 does contain the Mitochondrial genes

I will go ahead and use the CAT and Liftoff annotations, thank you for the help and the quick reply.