Starlitnightly/omicverse

More complete gene ensembl id -> hgnc symbol pairs table

Opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
data=ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_GRCh38.tsv') left with over 20k+ unconverted gene ensembl ids (h.sapiens, Grch38, 30%+ of all genes in counts). I was trying to build a more complete table.

Describe alternatives you've considered
I just selected approved symbols and ensembl ids in the hgnc website:
https://www.genenames.org/download/custom/
Removed all nan and made it a tsv table. Using that table I have all gene ids mapped.

Additional context
See attached for the gene id mapping table.

pair_hgnc_all.tsv.tar.gz

I just discovered if I do ov.bulk.Matrix_ID_mapping(data,'ref/genesets/pair_hgnc_all.tsv')
unmapped genes will be cut from the dataframe
so maybe need to disallow function to remove genes that's not on the gene id pair table.