Update phenotype to gene mapping, and recode to accept new format
Opened this issue · 0 comments
The mapping between HPO IDs and genes has been updated, and the format has changed.
The new format is here: http://compbio.charite.de/hudson/job/hpo.annotations.monthly/lastStableBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_phenotype_to_genes.txt
The old format is here:
http://medsavant.com/serve/ontology/phenotype_to_genes.txt
And the SQL format that we require is:
CREATE TABLE ontology
(
ontology
varchar(10) COLLATE latin1_bin NOT NULL,
id
varchar(30) COLLATE latin1_bin NOT NULL,
name
varchar(300) COLLATE latin1_bin NOT NULL,
def
mediumtext COLLATE latin1_bin,
alt_ids
varchar(300) COLLATE latin1_bin DEFAULT NULL,
parents
varchar(120) COLLATE latin1_bin DEFAULT NULL,
genes
mediumtext COLLATE latin1_bin,
PRIMARY KEY (id
)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin;
Where genes is a pipe (|) delimited string of genes for the term with name. See the ontology table in any MedSavant database for an example.