/kNN-BioEL

Primary LanguagePythonMIT LicenseMIT

kNN-BioEL

The code and dataset of paper Improving Biomedical Entity Linking with Retrieval-enhanced Learning in Proceedings of ICASSSP 2024.

Env

conda create -n bioel python==3.10.10
conda activate bioel
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

Base Model

Please download the baseline model SapBERT in advance and rename it as SapBERT-from-PubMedBERT-fulltext, then place it in the models folder.

Training

You can execute the following command to train each dataset, where 0 represents the ID of the GPU device.

bash scripts/train_cometa.sh 0
bash scripts/train_aap.sh 0
bash scripts/train_ncbi.sh 0
bash scripts/train_bc5cdr.sh 0

You can also skip the training and directly download our pre-trained model from here (password 793f) or here. Please place the downloaded weights for the four datasets in the save directory, and the directory is organized as follows:

save
|--cometa
|--aap
|--ncbi
|--bc5cdr

Evaluation

After training the model or downloading the pre-trained weights, execute the following command to evaluate kNN-BioEL.

bash scripts/eval_knn_cometa.sh 0
bash scripts/eval_knn_aap.sh 0
bash scripts/eval_knn_ncbi.sh 0
bash scripts/eval_knn_bc5cdr.sh 0

Citation

If you use this model or code, please cite it as follows:

@article{lin2023improving,
  title={Improving Biomedical Entity Linking with Retrieval-enhanced Learning},
  author={Lin, Zhenxi and Zhang, Ziheng and Wu, Xian and Zheng, Yefeng},
  journal={arXiv preprint arXiv:2312.09806},
  year={2023}
}