We are published in IJCAI 2024!
Publicly available data can be found in the github releases. You can extract it into the data folder
- load BioBERT and fine-tune it on the raw sentence dataset
- load GPT-3 API and generate diverse paraphrases of the raw sentences as augmentations
- enhance numerical values by adapting the tokenizer and embedding layer of BioBERT (dmis-lab/biobert-base-cased-v1.2)
- MLM of BioBERT on the augmented data
- fact checker dataset building with GPT3 API
- fine-tune BioBERT on the augmented data with fact checker filtering
- explore extend the raw sentences with new knowledge background texts, e.g., considering the input drug, extend the descriptions of them.
- extend to trial outcome prediction, three datasets: phase I & II & III.
- consider transfer learning across databases:
- EHR (40K+ patients) -> clinical trial patient data (~1k per dataset);
- clinicaltrials.gov (400K+ trials) -> trial outcome prediction (~5K per dataset)