marbl/CHM13

Question regarding 41 protein coding genes identified

singcell opened this issue · 1 comments

I the recently published Nature paper about Y chromosome, it says "T2T-Y contains an additional 110 genes, among which 41 are predicted to be protein coding. The majority of these protein-coding genes (38 of 41) are additional copies of TSPY, one of the nine ampliconic gene families, filling the corresponding gap in GRCh38-Y (Table [1])".

Are those 41 genes identified in this paper novel or just confirmed the previous findings?
Where can I find the names and sequences of these 41 protein coding genes supplementary data or in UCSC genome browser?

Hello,

I called them 'additional copies' in the sense that they are not present in annotations of GRCh38-Y, because there was no sequence. It is hard to find an 1:1 relationship.

You can grep the below gene names from the 5.1 curated annotation:

gene_prtn	TSPY4	6
gene_prtn	TSPY2	1
gene_prtn	TSPY3	9
gene_prtn	LOC124903544	2
gene_prtn	TSPY1	1
gene_prtn	TSPY8	2
gene_prtn	TSPY9	4
gene_prtn	RBMY1A1	1
gene_prtn	TSPY10	15

The numbers on the right show the 'additional copies' found.

Best,
Arang