Integrate UMLS Disease Mapping info into MyDisease.info
kevinxin90 opened this issue · 0 comments
kevinxin90 commented
Step 1: Extract all UMLS IDs belong to Disease Category
- source file: UMLS MRSTY file
- file location:
- This file maps a UMLS ID to its semantic type
- The first column of the file corresponds to the UMLS ID
- The fourth column of the file corresponds to the semantic type
- Extract all UMLS IDs belonging to these types
- Disease or Syndrome
- Mental or Behavioral Dysfunction
- Neoplastic Process
Step 2: Extract UMLS Mapping
- source file: MRCONSO.RRF
- This file maps a UMLS ID to corresponding identifiers in other sources
- The first column represents UMLS ID
- The 2nd column represents language, we only need to parse rows with language equal to "ENG"
- The 3rd column represents term status, could be P (preferred) or S (non-preferred)
- group the data based on the value of 3rd column
- The 12th column represents data source, we only need to parse rows with data source belong to these: NCI, MSH, SNOMEDCT_US, ICD10CM, ICD10, ICD10AM, ICD9CM.
- rename the value of 12th column:
- NCI: nci
- MSH: mesh
- SNOMEDCT_US: snomed
- ICD10CM: icd10cm
- ICD10: icd10
- ICD10AM: icd10am
- ICD9CM: icd9cm
- The 14th column represents the value.
Step3: Query mydisease.info for the primary id used in mydisease.info.
- use batch query feature supported in MyDisease.info
- query 1000 UMLS ids at a time
- query syntax
requests.post('http://mydisease.info/v1/query', data={'q': 'C0006826, C0008780', 'scopes':'disgenet.xrefs.umls, mondo.xrefs.umls'}, params={'fields': '_id'})
replace the value of 'q' with the 1000 UMLS ids separated with comma - If the primary id "_id" is not found, use the UMLS id as the primary id (prefixed with UMLS), e.g. UMLS:C0006826.
- If more than one primary id "_id" is found, create multiple records for it.
Example Output:
{
"_id": "MONDO:0004992",
"umls": {
"mesh": {
"preferred": "D009369",
"non-preferred": "D009369"
},
"snomed": {
"preferred": ["269513004", "154433003", ...],
"non-preferred": ["38807002", ...]
},
"icd10": {
...
},
"icd9cm": {
...
}
}