This repository contains the codes of TEIM (TCR-Epitope Interaction Modeling). TEIM is a deep learning-based model to predict the TCR-epitope interactions, including two submodels TEIM-Res (TEIM at Residue level) and TEIM-Samp (TEIM at Sequence level).
Both models only takes the primary sequences of CDR3βs and the epitopes as input. TEIM-Res predicts the distances and the contact probabilities between all residue pairs of CDR3βs and epitopes. TEIM-Seq predicts whether the CDR3βs and epitopes can bind to each other.
- Install Python>=3.8
- Install basic packages using:
pip install -r requirements.txt
- Install ANARCI for CDR3 numbering on the new environment.
conda install -c bioconda anarci
We also provided a docker file to facilitate the installation of environment. You can build the docker by runing
docker build -t teim:v1 .
-
Put your input TCR-epitope sequence pairs in the
inputs/inputs.csv
file. The TCRs are represented by their CDR3β sequences and the epitopes are represented by their sequences in the following format:cdr3 epitope CASAPGLAGGRPEQYF LLFGYPVYV CASRGAAGGRPQYF MLWGYLQYV CASRPGLAGGRAEQYF FTDSSVWA -
Run
python scripts/inference_res.py
-
The predicted distance matrices and contact site matrices are in the
outputs
directory:- The predicted distance matrix and contact matrix are in the files names as
dist_<cdr3>_<epitope>.csv
andsite_<cdr3>_<epitope>.csv
, respectively. - The rows and columns of the matrices represent the CDR3βs and epitopes, respectively.
- The values in the distance matrix stand for the distances of residue pairs (unit: angstrom) and the values in the contact matrix stand for the predicted contact scores (probabilities) of residue pairs (range from 0 to 1).
- The predicted distance matrix and contact matrix are in the files names as
- Put your input TCR-epitope sequence pairs in the
inputs/inputs_bd.csv
file. The format is the same asinputs/inputs.csv
(residue-level input file). - Run
python scripts/inference_seq.py
- The predicted sequence-level binding scores are in the
outputs/sequence_level_binding.csv
. Thebinding
column in the file represent the predicted sequence-level binding scores (probabilities) of the TCR-epitope pair.