/AlphaFold-disorder

Predict disorder and disorder binding from AlphaFold structures

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

AlphaFold-disorder

Disorder and binding region detection from AlphaFold predicted structures

The script parses and processes PDB files generated by AlphaFold. It expects the pLDDT score in the B-factor column. As intermediate (mandatory) step it calculates the Relative Solvent Accessibility (RSA) as provided by DSSP and BioPython.

Dependencies

  • Python3
  • NumPy
  • Pandas
  • BioPython
  • DSSP 3.x ("mkdssp" executable)

Usage

The script takes in input a folder with PDB files and writes two TSV files.

python3 alphafold_disorder.py -i pdbs/ -o out.tsv
Additional parameters
  • rsa_window (default 25) - RSA values are smoothed over a window centered on the residue to predict
  • rsa_threshold (default 0.581) - Binding predictions are overweighted when disorder prediction is above this threshold

Both parameters take a space separated list of values (floats). The program generates an output for each possible combination of the provided lists.

Output format
TSV

By default, the program uses the TSV format and generates two files out_data.tsv and out_pred.tsv, representing intermediate calculation (DSSP output) and the final prediction, respectively. The last two columns (disorder-<rsa_window>, binding-<rsa_window>-<rsa_threshold>) are the relevant ones representing the disorder and binding propensities.

name    pos     aa      lddt    disorder        rsa     disorder-25     binding-25-0.581
P47710  1       M       0.688   0.312   1.000   0.680   0.869
P47710  2       R       0.832   0.168   0.879   0.691   0.929
P47710  3       L       0.850   0.150   0.854   0.696   0.937
P47710  4       L       0.863   0.137   0.756   0.705   0.943
...
Q5RJL0  67      V       0.502   0.498   0.951   0.896   0.791
Q5RJL0  68      L       0.511   0.489   1.000   0.881   0.795
Q5RJL0  69      P       0.449   0.551   0.787   0.866   0.769
Q5RJL0  70      R       0.514   0.486   1.000   0.864   0.796
...
CAID

The CAID format can be generated with the command below.

python3 alphafold_disorder.py -i pdbs/ -o out.tsv -f caid

The program will generate different files for different types of prediction and different combination of parameters:

  • out_disorder.dat, disorder based on pLDDT
  • out_disorder-<rsa_window>.dat, disorder based on RSA and smoothed over a window
  • out_binding-<rsa_window>-<rsa_threshold>.dat, binding prediction wighted based on a threshold on the smoothed RSA
>P47710
1       M       0.68
2       R       0.691
3       L       0.696
4       L       0.705
...
67      V       0.896
68      L       0.881
69      P       0.866
70      R       0.864
...

How to cite

Piovesan D, Monzon AM, Tosatto SCE.
Intrinsic protein disorder and conditional folding in AlphaFoldDB. Protein Sci. 2022 Nov;31(11):e4466.
PMID: 36210722 PMCID: PMC9601767.