The phecodeX repository contains data files relevant to the updated phecode nomenclature. A brief description of the particular files are listed below. These data components are designed to work with the original PheWAS R package and the relevant text for using these datafiles with this package are described in phecodeX_PheWAS_example_R_script.txt
Example script for running the new PhecodeX data files with the original PheWAS R package.
This file includes information related to each phecode, including the phecode string, category, and columns indicating sex-specificity and ICD-10 only status. The columns are as follows:
phenotype The phecode label (two letters, “_”, and numeric phecode)
description A descriptive label for phecode
icd10_only A Boolean value: 1 if the phecode is defined only by ICD-10 codes; 0 if the phecode is defined by both ICD-9 and -10 codes
groupnum A numeric value corresponding to the phecode category
group A string indicating the phecode category
color A string value indicating the color to use in plots for each group
This file includes the ICD-9 and -10 codes that define each phecodes. This mapping is "flat" in that the phecodes are not "unrolled." The columns are as follows:
code The code included in the phecode grouping (current supported code types are ICD-9-CM and ICD-10-CM)
vocabulary_id A string indicating the code type (ICD9CM or ICD10CM)
phecode The phecode label
This file defines the phecode structure. Phecodes with decimals are "rolled up" to parent codes, such that every individual with code BI_160.11 also has BI_160.1 and BI_160.
code Primary phecode label
phecode_unrolled A phecode that is implied by the primary phecode label
A descriptive file designed to indicate if a specific phecode is indicative of a sex specific code based on (>90% of codes use associated with EHR-reported female/male only). The columns are as follows:
phecode The phecode label
male_only A true/false indicator of whether the specific code is used more than 90% of the time with EHR-reported male sex
female_only A true/false indicator of whether the specific code is used more than 90% of the time with EHR-reported female sex