biothings/mydisease.info

Integrate UMLS Disease Mapping info into MyDisease.info

kevinxin90 opened this issue · 0 comments

Step 1: Extract all UMLS IDs belong to Disease Category

  • source file: UMLS MRSTY file
  • file location:
  • This file maps a UMLS ID to its semantic type
    • The first column of the file corresponds to the UMLS ID
    • The fourth column of the file corresponds to the semantic type
  • Extract all UMLS IDs belonging to these types
    • Disease or Syndrome
    • Mental or Behavioral Dysfunction
    • Neoplastic Process

Step 2: Extract UMLS Mapping

  • source file: MRCONSO.RRF
  • This file maps a UMLS ID to corresponding identifiers in other sources
    • The first column represents UMLS ID
    • The 2nd column represents language, we only need to parse rows with language equal to "ENG"
    • The 3rd column represents term status, could be P (preferred) or S (non-preferred)
    • group the data based on the value of 3rd column
    • The 12th column represents data source, we only need to parse rows with data source belong to these: NCI, MSH, SNOMEDCT_US, ICD10CM, ICD10, ICD10AM, ICD9CM.
    • rename the value of 12th column:
      • NCI: nci
      • MSH: mesh
      • SNOMEDCT_US: snomed
      • ICD10CM: icd10cm
      • ICD10: icd10
      • ICD10AM: icd10am
      • ICD9CM: icd9cm
    • The 14th column represents the value.

Step3: Query mydisease.info for the primary id used in mydisease.info.

  • use batch query feature supported in MyDisease.info
  • query 1000 UMLS ids at a time
  • query syntax
    requests.post('http://mydisease.info/v1/query', data={'q': 'C0006826, C0008780', 'scopes':'disgenet.xrefs.umls, mondo.xrefs.umls'}, params={'fields': '_id'})
    replace the value of 'q' with the 1000 UMLS ids separated with comma
  • If the primary id "_id" is not found, use the UMLS id as the primary id (prefixed with UMLS), e.g. UMLS:C0006826.
  • If more than one primary id "_id" is found, create multiple records for it.

Example Output:

{
  "_id": "MONDO:0004992",
  "umls": {
      "mesh": {
             "preferred": "D009369",
             "non-preferred": "D009369"
      },
      "snomed": {
            "preferred": ["269513004", "154433003", ...],
            "non-preferred": ["38807002", ...]
      },
      "icd10": {
             ...
      },
      "icd9cm": {
             ...
      }
}