/FusionDTI

FusionDTI utilises a Token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction Prediction.

Primary LanguageJupyter Notebook

FusionDTI

Website Paper Demo

FusionDTI utilises a Token-level Fusion module to effectively learn fine-grained information for Drug-Target Interaction Prediction.

Framework

FusionDTI

Installation Guide

Clone this Github repo and set up a new conda environment.

# create a new conda environment
$ conda create --name FusionDTI python=3.8
$ conda activate FusionDTI

# install requried python dependencies
$ conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
$ pip install --upgrade transformers
$ pip install wandb

# clone the source code of FusionDTI
$ git https://github.com/ZhaohanM/FusionDTI.git
$ cd FusionDTI

Datasets

All data used in FusionDTI are from public resource: BindingDB [1], BioSNAP [2] and Human [3]. The dataset can be downloaded from here.

Train

For the experiments with FusionDTI, you can directly run the following command. The dataset could either be BindingDB, Biosnap, and Human.

$ python main_token.py --dataset BindingDB

Inference

After training the FusionDTI model, the best saved model is used to inference a single drug and target pair. In visualize_attention.ipynb, we provide the function of entering protein and drug sequences to visualise attention weights.

$ python attention.py --dataset BindingDB

How to obtain the structure-aware sequence of protein?

The structure-aware sequence of protein is based on 3D structure file (.cif) using Foldseek from the AlphafoldDB database. SaProt provides a function to convert a protein structure into a structure-aware sequence. The function calls the foldseek binary file to encode the structure. You can download the binary file from here and place it in the utils folder.

The following three steps are the obtainment process:

The first step, if you do not have Uniprot IDs, you will need to obtain them from the UniProt website based on existing amino acid sequences, protein names, etc. Then save them as a comma-delimited text file.

In the second step, the following code is run to get the protein structure file corresponding to the Uniprot ID.

$ python get_alphafold.py

Finally, you can run the following code to retrieve the structure-aware sequence of the protein.

$ python generate_stru_seq.py

How to obtain SELFIES of drug?

You need to install the python packages that convert the drug SMILES strings into SELFIES strings.

$ pip install selfies 
$ pip install pandarallel

Run the following code to generate SELFIES based on your SMILES.

$ python generate_selfies.py

Citation

Please cite our paper if you find our work useful in your own research.

@inproceedings{meng2024fusiondti,
title={Fusion{DTI}: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction},
author={Zhaohan Meng and Zaiqiao Meng and Iadh Ounis},
booktitle={ICML 2024 AI for Science Workshop},
year={2024},
url={https://openreview.net/forum?id=SRdvBPDdXB}
}