biocommons/uta

Use entrez gene_ids rather than symbols to link genes and transcripts

reece opened this issue · 4 comments

reece commented

The goal is to use entrez gene_ids rather than hgnc symbols to link genes and transcripts, including dropping the hgnc column from the transcript table.

Phase 1:

  • Update gene table schema and load from gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz.
  • Update txinfo format and relevant scripts to use gene_id rather than HGNC.
  • Update transcript table to use gene_id as FK to gene table.
  • Consider backfilling hgnc col in transcript for backward compatibility.

Phase 2:

  • Drop hgnc column from transcript table

Questions:

  • What to do with Ensembl transcripts?
  • Is it necessary to maintain transcript.hgnc for backward compatibility?

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

This issue was closed because it has been stalled for 7 days with no activity.

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.