Abstract

Protein structure prediction, a fundamental challenge in computational biology, aims to predict a protein's 3D structure from its amino acid sequence. This structure is pivotal for elucidating protein functions, interactions, and driving innovations in drug discovery and enzyme engineering. AlphaFold, a powerful deep learning model, has revolutionized this field by leveraging phylogenetic information from multiple sequence alignments (MSAs) to achieve remarkable accuracy in protein structure prediction. However, a key question remains: how well does AlphaFold understand protein structures? This study investigates AlphaFold's capabilities when relying primarily on high-quality template structures, without the additional information provided by MSAs. By designing experiments that probe local and global structural understanding, we aimed to dissect its dependence on specific features and its ability to handle missing information. Our findings revealed AlphaFold's reliance on sterically valid C-β atoms for correctly interpreting structural templates. Additionally, we observed its remarkable ability to recover 3D structures from certain perturbations. Collectively, these results support the hypothesis that AlphaFold has learned an accurate local biophysical energy function. However, this function seems most effective for local interactions. Our work significantly advances understanding of how deep learning models predict protein structures and provides valuable guidance for researchers aiming to overcome limitations in these models.

Usage

Installation

  1. make install-environment
  2. also install DSSP into T_A
  3. make install-attnpacker
  4. install OpenFold in it's own environment openfold_env
  5. install RFDiffusion into it's own environment
  6. install other utilities if not done before:
  7. customize paths.sh to your paths

Reproduce side-chain packing

  1. perform installation
  2. bash bash-packing.sh → you have to adjust for CASP13 or CASP14, also in generate_topology.tcl
  3. get results by running Packing_results.ipynb
  4. get SASA analysis by running SASA_analysis.ipynb

Reproduce synthetic backbone refinement

  1. perform installation
  2. bash bash-synthetic-backbones.sh → you have to adjust for CASP13 or CASP14 and give your own MAXIT_PATH
  3. get results by running Backbone_results.ipynb

Contact

If there are questions, please file a GitHub issue or send an e-mail to thomas.lemmin@unibe.ch and jannik.gut@unibe.ch.