/NLP_EyeDisease

Primary LanguageJupyter Notebook

Natural Language Processing for Extraction of Phenotypes for Inherited Retinal Disease from Electronic Health Records

Structure Diagram of this study:

Concept map

Dataset:

Moorfields Eye Hospital (MEH) unstructured free-text EHR data
MIMIC-III unstructured free-text EHR data

Natural Language Processing (NLP) pipeline:

Using CogStack SemEHR to identify eye disease phenotype.
https://github.com/CogStack/CogStack-SemEHR

Object:

Use binary classification for internal and external validation to determine whether the mentions identified by CogStack-SemEHR is true or not using BERT model.

Process:

  1. Preprocess of the dataset
    MIMIC-III: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_preprocessing.ipynb
    MEH: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MEH_preprocessing.ipynb
  2. Internal validation of BERT
    MIMIC-III: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_BERT.ipynb
    MEH: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MEH_BERT.ipynb
  3. External validation of BERT
    MIMIC-III use MEH for validation: https://github.com/pontikos-lab/NLP_EyeDisease/blob/main/MIMIC-III_bert_MEH_validation.ipynb