TACaPe (Transformed-based Anti-Cancer Peptide Classification and Generation) is a commandline tool to train transformer-based models for anticancer peptide classification and generation. I was built on top of Tensorflow and uses an auto-regressive algorithm for peptide design, which results can be filtered using an optional classification model.
$ pip install tacape
$ git clone https://github.com/omixlab/anticancer-peptide
$ cd anticancer-peptide
$ pip install -r requirements.txt -e .
$ conda env create
$ conda activate anticancer-peptide
Trains a classification model for anticancer peptide.
$ tacape-train-classifier -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Model Training [-h] --positive-train POSITIVE_TRAIN --negative-train NEGATIVE_TRAIN --positive-test POSITIVE_TEST
--negative-test NEGATIVE_TEST [--format {text,fasta}] --output OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--negative-train NEGATIVE_TRAIN
Input file containing negative peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--negative-test NEGATIVE_TEST
Input file containing negative peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
Runs a classification model for anticancer peptide prediction from a input file.
$ tacape-predict -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Predict [-h] --input INPUT [--format {text,fasta}] --classifier-prefix CLASSIFIER_PREFIX --output OUTPUT
optional arguments:
-h, --help show this help message and exit
--input INPUT Input file
--format {text,fasta}
[optional] Input file format (default: text)
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--output OUTPUT Path to the output CSV file
Trains a auto-regressive generative model for anticancer peptide.
$ tacape-train-generator -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generative Model Training [-h] --positive-train POSITIVE_TRAIN --positive-test POSITIVE_TEST [--format {text,fasta}] --output
OUTPUT [--epochs EPOCHS]
optional arguments:
-h, --help show this help message and exit
--positive-train POSITIVE_TRAIN
Input file containing positive peptides for training
--positive-test POSITIVE_TEST
Input file containing positive peptides for testing
--format {text,fasta}
[optional] Input file format (default: text)
--output OUTPUT Path prefix of the output files containing the trained model
--epochs EPOCHS [optional] Number of epochs to be used during training (default: 30)
Generates a set of peptides with potential anticancer activity from a trained generative model. If a classification model é provided, it will be used to filter the generated sequences and compute a probability of activity.
$ tacape-generate -h
/\__ _\ /\ __ \ /\ ___\ /\ __ \ /\ == \ /\ ___\
\/_/\ \/ \ \ __ \ \ \ \____ \ \ __ \ \ \ _-/ \ \ __\
\ \_\ \ \_\ \_\ \ \_____\ \ \_\ \_\ \ \_\ \ \_____\
\/_/ \/_/\/_/ \/_____/ \/_/\/_/ \/_/ \/_____/
usage: TACaPe: Generate [-h] --generator-prefix GENERATOR_PREFIX [--classifier-prefix CLASSIFIER_PREFIX]
[--number-of-sequences NUMBER_OF_SEQUENCES] [--temperature TEMPERATURE] [--threshold THRESHOLD] --output
OUTPUT
optional arguments:
-h, --help show this help message and exit
--generator-prefix GENERATOR_PREFIX
Path to the file prefix of the trained generative model
--classifier-prefix CLASSIFIER_PREFIX
[optional] Path to the file prefix of the trained classification model
--number-of-sequences NUMBER_OF_SEQUENCES
[optional] Number of sequences to be generated (default: 1000)
--temperature TEMPERATURE
[optional] Temperature used for logit scaling when sampling aminoacids during auto-regressive generation
(default: 1.0)
--threshold THRESHOLD
[optional] Classification probability threshold (default: 0.5)
--output OUTPUT Path to the output CSV file
$ tacape-train-classifier \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--negative-train data/raw/anti_cp/anticp2_main_internal_negative.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--negative-test data/raw/anti_cp/anticp2_main_validation_negative.txt \
--output data/models/classifier \
--epochs 100
$ tacape-predict \
--input data/raw/anti_cp/anticp2_main_validation_positive.txt \
--format text \
--classifier-prefix data/models/internal \
--output data/models/internal_results.csv
$ tacape-train-generator \
--positive-train data/raw/anti_cp/anticp2_main_internal_positive.txt \
--positive-test data/raw/anti_cp/anticp2_main_validation_positive.txt \
--output data/models/generator \
--epochs 100
$ tacape-generate \
--generator-prefix data/models/generator \
--classifier-prefix data/models/classifier \
--number-of-sequence 100 \
--output data/models/generated.csv