Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks
This repository contains code for training and finetuning for "Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks"
Authors: Aryo Pradipta Gema, Dominik Grabarzcyk, Wolf De Wulf, Piyush Borole, Dr. Javier Alfaro, Dr. Pasquale Minervini, Dr. Antonio Vergari, Dr. Ajitha Rajan
Create an anaconda environment using the environment.yaml file:
conda env create -f environment.yml
Activate the environment:
conda activate kge
Clone and install libkge:
git clone git@github.com:uma-pi1/kge.git
cd libkge
pip install -e .
To deactivate the environment:
conda deactivate
The used knowledge graphs are those from the v1.0.0 release of BIOKG:
- BIOKG
- BIOKG benchmarks:
- ddi_efficacy
- ddi_minerals
- dpi_fda
- dep_fda_exp
Download them using the download.py script:
python scripts/data/download.py --help
The set seed ensures that they are the same as the ones used in our evaluations. We can also provide them upon request.
The libkge dataset format is used.
Once downloaded, dataset folders need to be moved to kge/data
.
All configuration files for the link prediction evaluations mentioned in the article can be found in the configs/link_prediction folder.
Please read through the libkge documentation to find out how to use them.
To be able to run the evaluations where models are initialised with pretrained embeddings, make sure to download the models
folder from the supplementary material.
Warning: The HPO runs can take up to a week to finish and some of the generated configurations might require a high-end GPU to be able to run at all. During research, these HPO runs were ran on HPC clusters.
All configuration files for the relation classification evaluations mentioned in the article can be found in the configs/relation_classification folder.
To reproduce our results, use the relation_classification.py script in combination with one of the config files:
python scripts/benchmarks/relation_classification.py --help
Feel free to contact any of the authors via email if you have questions.