compbio-UofT/medsavant

Update phenotype to gene mapping, and recode to accept new format

Opened this issue · 0 comments

The mapping between HPO IDs and genes has been updated, and the format has changed.

The new format is here: http://compbio.charite.de/hudson/job/hpo.annotations.monthly/lastStableBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_phenotype_to_genes.txt

The old format is here:
http://medsavant.com/serve/ontology/phenotype_to_genes.txt

And the SQL format that we require is:

CREATE TABLE ontology (
ontology varchar(10) COLLATE latin1_bin NOT NULL,
id varchar(30) COLLATE latin1_bin NOT NULL,
name varchar(300) COLLATE latin1_bin NOT NULL,
def mediumtext COLLATE latin1_bin,
alt_ids varchar(300) COLLATE latin1_bin DEFAULT NULL,
parents varchar(120) COLLATE latin1_bin DEFAULT NULL,
genes mediumtext COLLATE latin1_bin,
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin;

Where genes is a pipe (|) delimited string of genes for the term with name. See the ontology table in any MedSavant database for an example.