This repository contains the data and algorithm I have created for my Master Thesis on the topic "Identifying Key Phrases in Real World Patient Notes".
The two datasets used for this research are NBME / USMLE dataset and VAERS dataset, both of them are uploaded in the repository.
The files can be read and executed sequentually
- KeyBERT with xxxx where xxxx are the custom embedding models used
- Evaluation file which describes which threshold and model is selected
- VAERS dataset algorithm check