- Use setid in csv file file to download XML files.
- Use MedDRA Terminology instead of MESH terms, and use exact match to extract MedDRA from labeling section (e.g., boxed warnings, warning and precaution, and adverse reaction).
- Train classifier using phase-2/output/all-description-bio-schema.tsv as input to SpaCy/BERT.
Download XML files containing MedDRA from labeling section (e.g., boxed warnings, warning and precaution, and adverse reaction).
- Run phase-1/download_labeling.py to download XML files against setid in csv file.
- The output is saved in json_data.json.
Convert content of json_data.json to single long text and write to file phase-2/output/all-description.txt.
- Run phase-2/convert_text_to_bio_schema.py to convert phase-2/output/all-description.txt to output BIO schema format.
- The output is saved in
- The four output files from previous step are manually combined and saved in phase-2/output/all-description-bio-schema.tsv.
Train NER classifier using SpaCy/BERT.
- Manually split phase-2/output/all-description-bio-schema.tsv into 60% train, 20% validate and 20% test data.
- Use train and devel data as input to train SpaCy/BERT model.
- Use test data to test the trained model.
- Train and test the model
- Option 1
- Run phase-3/train_custom_ner_with_spacy.ipynb to train the model.
- Run phase-3/test_custom_ner_with_spacy.ipynb to test the model.
- Option 2
- Run phase-3/spacy_ner.py to train and test the model.
- Option 1