This repository contains the code and the data to train epiTCR-KDA model.
- Python >= 3.6.8
- Keras 2.6.0
- TensorFlow 2.6.0
git clone https://github.com/ddiem-ri-4D/epiTCR-KDA
cd epiTCR-KDA/
conda env create -f environment.yml
source activate kda
- Download training and testing data from
datasets
folder. - Download the 3D structure and dihedral angles of TCR and peptide from folders
3DS_PDBFiles
andDA_TSVFiles
.
- Prepare a list containing unique TCR/peptides from the data for training/testing.
- Check if the unique TCR/peptides are already present in the DA_TSVFiles folders or not by executing the following command:
cd utils
python3 check3DSDA.py
- If they are already complete, proceed to step 4.
- If not, run the 3D structure using OmegaFold and add the structure to the PDB folders, following the steps below:
- Prepare a FASTA file containing the TCR/peptide sequences to run OmegaFold, see an example here.
- Refer to the OmegaFold running steps here, and place the output into the PDB files directory.
- Double-check for any TCR/peptides that might still lack a structure. If all structures are present, proceed to step 3.2.
- After obtaining the 3D structure, run Biopython to retrieve Dihedral Angles information, resulting in an output *.tsv file, see an example here.
- The output *.tsv files containing Dihedral Angles information are placed into the DA folders directory.
cd utils
python3 PDB2DA.py
train.parquet
/test.parquet
: input parquet file with 3 columns named as "CDR3b, epitope, binder (if training)": TCR-beta CDR3 sequence, peptide sequence, and CDR3b and peptide bind together or not.
CDR3b | epitope | binder |
---|---|---|
AASSYGQNFV | QIKVRVDMV | 1 |
AIRAGGDEQ | HSKKKCDEL | 1 |
AISETDKLG | LPPIVAKEI | 1 |
SARDRVRTDTQY | FVSKLYYFE | 0 |
SARDRVRTDTQY | KLSHQPVLL | 0 |
- An example for training and testing
python3 train.py \
--trainfile ./datasets/DATA_4MODEL/TRAIN-TEST/train.parquet \
--testfile ./datasets/DATA_4MODEL/TRAIN-TEST/test.parquet \
--savemodel ./models/KDA_model.h5 \
--outfile ./datasets/DATA_4PRED/test_prediction.parquet
python3 test.py \
--testfile ./datasets/DATA_4MODEL/TRAIN-TEST/test.parquet \
--savedmodel ./models/KDA_model.h5 \
--outfile ./datasets/DATA_4PRED/test_prediction.parquet
For more questions or feedback, please post an Issue.
Please cite this paper if it helps your research:
@article {Pham2024.05.18.594806,
author = {Pham, My-Diem Nguyen and Su, Chinh Tran-To and Nguyen, Thanh-Nhan and Nguyen, Hoai-Nghia and Nguyen, Dinh Duy An and Giang, Hoa and Nguyen, Dinh-Thuc and Phan, Minh-Duy and Nguyen, Vy},
title = {epiTCR-KDA: Knowledge Distillation model on Dihedral Angles for TCR-peptide prediction},
elocation-id = {2024.05.18.594806},
year = {2024},
doi = {10.1101/2024.05.18.594806},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2024/05/21/2024.05.18.594806},
eprint = {https://www.biorxiv.org/content/early/2024/05/21/2024.05.18.594806.full.pdf},
journal = {bioRxiv}}
My-Diem Nguyen Pham, Thanh-Nhan Nguyen, Le Son Tran, Que-Tran Bui Nguyen, Thien-Phuc Hoang Nguyen, Thi Mong Quynh Pham, Hoai-Nghia Nguyen, Hoa Giang, Minh-Duy Phan, Vy Nguyen, epiTCR: a highly sensitive predictor for TCR–peptide binding, Bioinformatics, Volume 39, Issue 5, May 2023, btad284, https://doi.org/10.1093/bioinformatics/btad284