/hopp-woods

Prediction of hydrophilicity & antigenic determinants from protein sequences

Primary LanguageJupyter NotebookBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Hopp-Woods Hydrophilicity Prediction with Linear Variation Model

Description

A numeric value is assigned to each amino acid, using the amino acid scale outlined in the paper:

Prediction of protein antigenic determinants from amino acid sequences, T Hopp, K Woods, PNAS 1981 link

User provides protein seq in single-letter amino acid code, specifies a window size (length of the peptide), and edge weight (default \alpha=1). For each amino acid in the window, the program computes a weight using the linear variation model. It then applies the weight to the original score at amino acid level. The final hydrophilicity score for the peptide is calculated by dividing the sum of the corrected amino acid scores by the sum of the weights. The program repeats the process along the sequence of the protein.

Mathematical principles

Given:

Rank items in set S from high to low

where:

\phi(n) : w_{i}\neq1 or non-weighted w_{i}=1 hydrophilicity scores

N : Number of amino acids in the protein

n : residue index position on the protein (starting from 0)

\Delta : size of the peptide "window"

X_{i} : Hopp-Woods hydrophilicity value of amino acid X at index position i

w_{i} : weight used at each position. Weights are calculated using linear variation model (see below)

Linear Variation Model for Calculation of Weights

  1. When no weights are used:

    w_{i}=1

  2. When using weights from linear variation model, specify edge weight

    1) When the peptide window \Delta is an odd number:

    For example, if window=7 (7-mer peptide), edge \alpha=0.1, then the first and the last weights will be 0.1. The weight for each amino acid in the 7-mer is linspaced as:

    [0.1, 0.4, 0.7, 1.0, 0.7, 0.4, 0.1]

    2) When the peptide window \Delta is an even number: (new feature not available in Expasy)

    For example, if window=10, edge \alpha=0.1, then the first and the last weights will be 0.1. The weight for each amino acid in the 10-mer is linspaced as:

    [0.1, 0.33, 0.55, 0.78, 1.0, 1.0, 0.78, 0.55, 0.32, 0.1]

Examples of Usage

See jupyter notebook

Validation against Expasy Query Results

Expasy results were obtained from ProtScale