/brewpitopes

Set of tools to manage epitope prediction results from linear and structural origin and to integrate a pipeline of prioritization filters.

Primary LanguageJupyter Notebook

BREWPITOPES: a pipeline to refine B-cell epitope predictions during public health emergencies

Abstract

The application of B-cell epitope identification for the development of therapeutic antibodies is well established but consuming in terms of time and resources. For this reason, in the last few years, the immunoinformatic community has developed several computational predictive tools.

While relatively successful, most of these tools only use a few properties of the candidate region to determine their likelihood of being a true B-cell epitope. However, this likelihood is influenced by a wide variety of protein features, including the presence of glycosylated residues in the neighbourhood of the candidate epitope, the subcellular location of the protein region or the three-dimensional information about their surface accessibility in the parental protein.

In this study we created Brewpitopes, an integrative pipeline to curate computational predictions of B-cell epitopes by accounting for all the aforementioned features. To this end, we implemented a set of rational filters to mimic the conditions for the in vivo antibody recognition to enrich the B-cell epitope predictions in actionable candidates. To validate Brewpitopes, we analyzed the SARS-CoV-2 proteome. In the S protein, Brewpitopes enriched the initial predictions in 5-fold on epitopes with neutralizing potential (p-value < 2e-4). Other than S protein, 4 out of 16 proteins in the proteome contain curated B-cell epitopes and hence, have also potential interest for viral neutralization, since mutational escape mainly affects the S protein. Our results demonstrate that Brewpitopes is a powerful pipeline for the rapid prediction of refined B-cell epitopes during public health emergencies.

Available as preprint at: https://doi.org/10.1101/2022.11.28.518301

INSTALLATION (DOCKER IMAGE)

To compile the Dockerfile you will need to have docker installed. And use the following commands:

  1. Create docker image from Dockerfile (may take a while):
      sudo docker build -t brewpitopes PATH/TO/Dockerfile
  1. Create a shared folder between Brewpitopes docker image and your local machine.
      sudo docker run -it --volume /your/machine/directory:/home/Projects brewpitopes 

PIPELINE

  1. Use directories.R to create the folder environment.

    Rscript directories.R --path /your/desired/folder
    
  2. Download the FASTA file of the target protein at Uniprot.

    Save at /Z_fasta

  3. Use the FASTA to predict linear epitopes using [Bepipred 2.0] (https://services.healthtech.dtu.dk/service.php?BepiPred-2.0) server and export results as csv (default parameters).

    Save at /A_linear_predictions/bepipred/bepipred_results.csv

  4. Extract epitopes from Bepipred results using epixtractor_linear_bebipred.py.

python3 epixtractor_linear_bebipred.py
Add path to bepipred results: your/path/to/A_linear_predictions/bepipred/bepipred_results.csv
Add path to output folder: your/path/to/C_epixtractor    
  1. Use the FASTA to predict linear epitopes using ABCpred server.

    Predict using all the epitope windows (10,12,14,16,18,20) and overlapping filter ON.
    Copy results from the webpage to a .csv
    Save at: path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_10mers.csv

  2. Extract epitopes from ABCpred results using epixtractor_linear_abcpred.R

Rscript epixtractor_linear_abcpred.R --outpath your/path/to/brewpitopes/C_epixtractor --input_10mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_10mers.csv --input_12mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_12mers.csv --input_14mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_14mers.csv --input_16mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_16mers.csv --input_18mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_18mers.csv --input_20mers your/path/to/brewpitopes/A_linear_predictions/abcpred/abcpred_20mers.csv
  1. Download the PDB file of the target protein at PDB DB. Save at /brewpitopes/B_structural_predictions/pdb

  2. Use PDBrenum server to renumerate the PDB residues according to its corresponding FASTA file in Uniprot.
    Download results as .pdb
    Save at /brewpitopes/B_structural_predictions/pdbrenum

  3. Use the renumbered PDB to predict structural epitopes using Discotope 2.0 server and export the results as csv.
    Default threshold.
    Select chain A by default. Save at /brewpitopes/B_structural_predictions/discotope

  4. Extract epitopes from Discotope results using epixtract_structural.py

python3 epixtract_structural.py  
Add path to discotope results: brewpitopes/B_structural_predictions/discotope/discotope_results.csv  
Add path to output folder: brewpitopes/C_epixtractor
  1. Merge the epitopes extracted from Bepipred, ABCpred and Discotope results using epimerger.R
Rscript epimerger.R --abcpred your/path/to/brewpitopes/C_epixtractor/abcpred_results_extracted.csv --bepipred your/path/to/brewpitopes/C_epixtractor/abcpred_results_extracted.csv --discotope your/path/to/brewpitoeps/C_epixtractor/discotope_results_extracted.csv --outdir your/path/to/brewpitoeps/D_epimerger
  1. Predict the protein topology using CCTOP server.
    Donwload results as .xml. Save at your/path/to/brewpitopes/E_topology/CCTOP/cctop.xml

  2. Extract the topological domains using xml_cctop_parser.R

Rscript xml_cctop_parser.R --xml path/to/brewpitopes/E_epitopology/CCTOP/cctop.xml --outdir path/to/brewpitopes/E_epitopology/CCTOP
  1. Label the epitopes based on their topology (intracellular, membrane or extracellular) using epitopology. Using CCTOP predictions --> use epitopology_cctop.R
Rscript epitopology_cctop.R --input_CCTOP path/to/brewpitopes/E_epitopology/CCTOP/cctop_domains.csv --input_epitopes path/to/brewpitopes/D_epimerger/merged.csv --outdir path/to/brewpitopes/E_epitopology

Using manual annotation --> use epitopology_manual.R

Rscript epitopology_manual.R --start_pos 1,12,22 --end_pos 8,18,28 --input_epitopes path/to/brewpitopes/D_epimerger/merged.csv --outdir path/to/brewpitopes/E_epitopology
  1. Predict the glycosilation profile of the protein using the FASTA file.
    N-GLYCOSILATIONS at NetNGlyc 1.0 server.
    COPY MANUALLY THE DATAFRAME HEADED: SeqName Position Potential Jury_agreement NGlyc_result Prediction
    SAVE AS CSV at brewpitopes/F_epiglycan/netnglyc

    O-GLYCOSILATIONS AT NetOGlyc 4.0 server. COPY MANUALLY THE DATAFRAME HEADED: seqName source feature start end score strand frame comment
    SAVE AS CSV at brewpitopes/F_epiglycan/netoglyc

  2. Extract the glycosilated positions from both N-glyc and O-glyc outputs using epiglycan_extractor.R

Rscript epiglycan_extractor.R --oglyc /your/path/to/brewpitopes/F_epiglycan/netoglyc/oglyc.csv --nglyc /your/path/to/brewpitopes/F_epiglycan/netnglyc/nglyc.csv --outdir brewpitopes2/F_epiglycan/
  1. Use epiglycan.py to label the glycosilated epitopes.
python3 epiglycan.py
Add path to input epitopes: brewpitopes/E_epitopology/topology_extracted.csv
Add path to output folder: brewpitopes/F_epiglycan
Add path to extracted glycosilated positions: brewpitopes/F_epiglycan/glycosilated_positions.csv  
  1. Use ICM_browser (MOLSOFT) to extract the RSA values for accessibility calculation.
    Download ICM_browser from http://www.molsoft.com/icm_browser.html Open the PDB renumbered file of the corresponding protein (step 9).
    Execute in the command line of the programme the code in Compute_ASA.icm
    Save results at /G_episurf

  2. Extract the buried positions using icm_extractor.R

Rscript icm_extractor.R --icm /your/path/to/brewpitopes/G_episurf/icm/rsa.csv --outdir your/path/to/brewpitopes/G_episurf/
  1. Label the epitopes based on their buried positions using episurf.py
python3 episurf.py  
Add path to input epitopes: brewpitopes/F_epiglycan/glycan_extracted.csv
Add path to output folder: brewpitopes/G_episurf
Add path to extracted buried positions: brewpitopes/G_episurft/buried_positions_list.csv
  1. Select the epitopes that are extraviral, non-glycosilated, exposed and length >= 5 using epifilter.R
Rscript epifilter.R --data /your/path/to/brewpitopes/G_episurf/access_extracted.csv --outdir /your/path/to/brewpitopes/I_final_candidates
  1. Use epicontig.ipynb (Jupiter Notebook) to extract the epitopic regions / contigs.
    Upload the candidates_df.csv generated at step 22. Follow the instructions in the Notebook.
  2. Use yield_plot.R to plot the results of the pipeline.
    Follow the instructions in the R file.

APPENDIX FOR VARIANTS OF CONCERN

  1. Generate FASTA using fasta_mutator.R
    Download reference FASTA from Spike protein from UniprotKB.
    Upload where indicated at script instructions.
    Upload the mutations of the corresponding VOC found as attached files in this Github. (ie Gamma = 20211203_spike_gamma_vocs.csv) Execute the script and save the VOC Fasta file.
    Once saved, remove "" from the file to obtain a properly formatted FASTA.
    Start the pipeline above with the mutated FASTA file.