General graph-based framework to design epitope vaccines.
The dependencies required to run this framework can be installed via the following command:
conda env create --name <env name> --file environment.yml
We make heavy use of epytope and one of the persistent solvers supported by Pyomo (currently either Gurobi, which is the default we use, or CPLEX).
For your convenience, we provide a makefile that invokes the necessary commands in the right order and with the right arguments, using the sample files in the resources
folder:
make
This will process the data, produce the vaccine using all the supported methods, and save their evaluation in the dev
folder. The vaccines are files named made-<method>-vaccine.csv
and their evaluations are named made-<method>-evaluation.csv.
For a little more control, the following command allows you to specify the required input files and the output folder:
make all alleles=resources/alleles-small.csv proteins=resources/hiv1-bc-env-small.fasta BASE_DIR=dev
You are encouraged to read the next sections to learn the details, or jump straight to the Makefile.
Because of the long processing time required to design a vaccine, this process is split several stages, with intermediate results saved in CSV files. In the following sections we give concrete examples that work out of the box on how to run the commands; of course, you are free to modify the parameters to suit your needs.
By convention, the last argument of each command is the output file, and usage help can be printed by using the --help
option.
The inputs required are the proteins to target, in fasta format, e.g.
>B.JP.2004.04JPDR6075B.AB221125
MRVTGIRKNCQLLWKWGTMLLGMLMICSALEQLWVTVYYGVPVWKDATTT
LFCASDAKAYDTEMHNVWATHACVPTDPNPQEVVLVNVTEEFNMWKNNMV
EQMHEDIINLWDQSLKPCVKLTPLCVTLHCTDLKNVTANSTASNSSTNWK
HGIRPVVSTQLLLNGSLAEEEVVLRSENFTNNAKTIIVQLNKTVVINCTR
PNNNTRKGIHIGAGRAIYATGAIIGDIRQAHCNLSETDWKNTLNKTVRKL
...
and the HLA alleles of interest, along with their percent frequency in the population and the binding threshold:
allele | frequency | threshold |
---|---|---|
HLA-A*02:01 | 10.693… | 0.405 |
HLA-A*24:02 | 12.906… | 0.405 |
HLA-B*40:01 | 5.310… | 0.405 |
HLA-B*51:01 | 3.245… | 0.405 |
The resources folder contains small samples that you can play with.
Following are the commands required to prepare the data:
-
(Only if using the Makefile) start by copying the input files in the work directory:
make init \ alleles=resources/alleles-small.csv \ proteins=resources/hiv1-bc-env-small.fasta
By default, the work directory is
./dev/
, but it can be customized by using the argumentBASE_DIR=<dir>
. Note that if you do this now, you must include this argument for all subsequentmake
usages, too. -
Extract the peptides and their coverage from the proteins you want to target:
make coverage \ proteins=resources/hiv1-bc-env-small.fasta # customize options with COVERAGE_OPTS="..."
Or:
python data_preparation.py -v extract-peptides \ resources/hiv1-bc-env-small.fasta \ dev/hiv1-bc-env-small-coverage.csv
Sample output:
peptide proteins IKQACPKVT 8;13 RNLCLFGYH 15 KEYALFYTL 19 KTLEQIAEK 9 RSSLRGLQR 0;8;3;13;16 ... ... -
Compute the binding affinities between these peptides and the HLA alleles of interest. These affinities are ic50 values scaled as follows:
affinity = 1 - log(ic50) / log(50000)
, so that 50, 500, 5000 and 50000 nM are scaled to 0.638, 0.426, 0.213 and 0.000 respectively.make affinities \ alleles=resources/alleles-small.csv # customize options with AFFINITIES_OPTS="..."
Or:
python data_preparation.py -v compute-affinities \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-coverage.csv \ dev/hiv1-bc-env-small-affinities.csv
Sample output:
Seq Method HLA-A*02:01 HLA-A*24:02 HLA-B*40:01 ... AADKLWVTV netmhcpan 0.256… 0.070… 0.138… ... AAELLGRSS netmhcpan 0.016… 0.004… 0.028… ... AAEQLWVTV netmhcpan 0.152… 0.064… 0.127… ... AAGSTMGAA netmhcpan 0.093… 0.011… 0.037… ... AAHCNISEG netmhcpan 0.024… 0.016… 0.043… ... ... ... ... ... ... ... -
Extract the epitopes, their immunogenicity and their protein and HLA coverage:
make epitopes # customize options with EPITOPES_OPTS="..."
Note that make will automatically use the intermediate files produced previously.
Manual invocation via:
python data_preparation.py -v extract-epitopes \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-coverage.csv \ dev/hiv1-bc-env-small-affinities.csv \ dev/hiv1-bc-env-small-epitopes.csv
Sample output:
immunogen alleles proteins epitope 0.124… HLA-A*02:01 9;18 QMQEDIISL 0.142… HLA-A*24:02 8;13 SWFSITNWL 0.200… HLA-A*24:02;HLA-A*02:01 8;13 YQRWWIWSI 0.154… HLA-C*07:02;HLA-A*24:02 19;14 MYAPPIEGL 0.057… HLA-A*02:01 6;15 LLALDSWAS -
Compute the cleavage scores between all pairs of epitopes (needed only for the string-of-beads design)
make cleavages # customize options with CLEAVAGE_OPTS="..."
Or:
python data_preparation.py -v compute-cleavages \ dev/hiv1-bc-env-small-epitopes.csv \ dev/hiv1-bc-env-small-cleavages.csv
Sample output:
from to score FLGAAGSTM FLGAAGSTM -0.001… FLGAAGSTM NVWATHACV 0.798… FLGAAGSTM VTVYYGVPV 0.673… FLGAAGSTM FIMIVGGLI 1.222… FLGAAGSTM KLTPLCVTL 1.310… ... ... ... -
Compute the overlaps between all pairs of epitopes (needed only for the mosaic design)
make overlaps # customize options with OVERLAPS_OPTS="..."
Or:
python data_preparation.py -v compute-overlaps \ dev/hiv1-bc-env-small-epitopes.csv \ dev/hiv1-bc-env-small-overlaps.csv
Sample output:
from | to | cost |
---|---|---|
YAPPISGYI | TYNNTYSTY | 8 |
SILGFWMLI | CLSNITGLL | 9 |
APGVGAASQ | ASQDLAKHG | 6 |
TTAAEGVGA | GAITISNTA | 7 |
SITHWLWYI | AYFYRSDVV | 9 |
... | ... | ... |
-
Mosaic: use this generalized framework to design a mosaic vaccine
make mosaic-vaccine # customize options with MOSAIC_OPTS="..."
Or:
python design.py -v mosaic \ resources/hiv1-bc-env-small.fasta \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-epitopes.csv \ dev/hiv1-bc-env-small-overlaps.csv \ dev/hiv1-bc-env-small-vaccine-mosaic.csv
-
String of Beads: use this generalized framework to design a string-of-beads vaccine
make string-of-beads-vaccine # customize options with STRING_OF_BEADS_OPTS="..."
Or:
python design.py -v string-of-beads \ resources/hiv1-bc-env-small.fasta \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-epitopes.csv \ dev/hiv1-bc-env-small-cleavages \ dev/hiv1-bc-env-small-vaccine-string-of-beads.csv
-
OptiTope: based on [1] and [2]
make optitope-vaccine # customize options with OPTITOPE_OPTS="..."
Or:
python design.py -v optitope \ dev/hiv1-bc-env-small-coverage.csv \ dev/hiv1-bc-env-small-affinities.csv \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-vaccine-optitope.csv
-
PopCover: based on [3]
make popcover-vaccine # customize options with POPCOVER_OPTS="..."
Or:
python design.py -v popcover \ dev/hiv1-bc-env-small-coverage.csv \ dev/hiv1-bc-env-small-affinities.csv \ resources/alleles-small.csv \ dev/hiv1-bc-env-small-vaccine-popcover.csv
Sample output (same format for all methods):
cocktail index epitope 0 0 YQRWWIWSI 0 1 YTDTIYWLL 0 2 LLQYWSQEL 0 3 YFPNKTMNF ... ... ...
Evaluation computes the following metrics: total immunogenicity, allele coverage, pathogen coverage, average epitope conservation and population coverage. The population coverage is also computed relative to the maximum theoretical coverage that can be achieved with the given alleles.
make mosaic-evaluation # make mosaic is also valid
Make can of course evaluate the vaccines produced by all methods via make <method>-evaluation
, or simply make <method>
.
The full command for the mosaic vaccine is as follows, make sure to use the correct input file for the vaccine:
python evaluation.py -v
resources/hiv1-bc-env-small.fasta \
dev/hiv1-bc-env-small-coverage.csv \
resources/alleles-small.csv \
dev/hiv1-bc-env-small-epitopes.csv \
dev/hiv1-bc-env-small-vaccine-mosaic.csv \
dev/hiv1-bc-env-small-evaluation-mosaic.csv
Sample output:
norm_prot_coverage | prot_coverage | pop_coverage | conservation | rel_pop_coverage | immunogen | max_pop_coverage |
---|---|---|---|---|---|---|
0.75 | 15 | 0.524… | 0.105… | 0.780… | 1.960… | 0.671… |
The experiments folder contains several scripts and inputs to design vaccines under various settings.
You must first run make-bootstrap.sh
to create five random subsets of 300 sequences, which will be used by the other scripts.
[1] Toussaint NC, D ̈onnes P, Kohlbacher O. A mathematical framework for the selection of an optimal set of peptides forepitope-based vaccines.PLoS Comput Biol2008:4: e1000246.17.
[2] Toussaint NC, Kohlbacher O. OptiTope – a web server for theselection of an optimal set of peptides for epitope-basedvaccines.Nucleic Acids Res2009:37(suppl 2): W617–W622
[3] Lundegaard C, Buggert M, Karlsson A, Lund O, Perez C,Nielsen M. PopCover: a method for selecting of peptides withoptimal population and pathogen coverage. In:Proceedings ofthe First ACM International Conference on Bioinformatics andComputational Biology,2010.ACM