/NTF2Gen

NTF2Gen: An enumerative algorithm for generating full-atom models of proteins belonging to the NTF2-like superfamily

Primary LanguageJupyter Notebook

NTF2Gen: An enumerative algorithm for full-atom models of proteins belonging to the NTF2-like superfamily


If you find the scripts in this repository useful, please cite https://doi.org/10.1073/pnas.2005412117.

This algorithm samples a wide diversity of protein structures by carrying out backbone sampling at two levels. At the top level, sampling is carried out in the space of high-level parameters that define the overall properties of the NTF2 fold: for example, the overall sheet length and curvature, the lengths of the helices that complement the sheet, the placement of the pocket opening and the presence or absence of C-terminal elements. We then convert each choice of high-level parameters into structure blueprint/constraints pairs, which guide backbone structure sampling at successive stages of fold assembly. In a final sequence design step, for each generated backbone, low energy sequences are identified through combinatorial sequence optimization using RosettaDesign.

The NTF2Gen repository contains all the tools for de novo design of NTF2-like proteins. The main script is CreateBeNTF2_backbone.py, which manages the construction of NTF2 backbones, followed by DesignBeNTF2.py (BeNTF2seq/Nonbinding, or DesignBeNTF2_test1.py at BeNTF2seq/design_with_PSSM to design using PSSMs), which designs sequence on a given backbone generated by the previous script. To generate backbones from a specific set of parameters, use CreateBeNTF2PDBFromDict.py.

The fundamental building blocks of the backbone generation protocol are Rosetta XML protocols (included in the repository) that are specialized instances of the BlueprintBDRMover Rosetta fragment assembly mover. All backbone quality checks and filters previous to design are implemented either in the XML files or the python scripts. The design script is also based on a set of XML protocols, one for each design stage. The glycine placement in highly curved strand positions and the selection of pocket positions are managed by DesignBeNTF2.py (or DesignBeNTF2_test1.py at BeNTF2seq/design_with_PSSM to design using PSSMs). Pocket positions are selected by placing a virtual atom in the midpoint between the H3-S3 connection and the S6 bulge, and choosing all positions whose Cα-Cβ vector is pointing towards the virtual atom (the Vatom-Cα-Cβ angle is smaller than 90º), and their Cα is closer than 8Å.

Dependencies

pyrosetta*
pandas

*pyrosetta is free (with a subscription) for academic use: http://www.pyrosetta.org/dow

Pre-generated scaffold library:

As the overarching goal of this work is to expand the set of available protein structures with pockets, we generated a final set of scaffolds that incorporates all of the lessons from this study. Here we present proteins from 1,619 unique parameter combinations with improved stability-related metrics (see SI Appendix, Supplementary Methods and Figs. S33, S34, and S40 for pocket diversity). We have made this set of 32,380 scaffolds (20 models with different sequences per parameter combination) available for general use as starting points for ligand binding and enzyme design.
To access them, download this repo and go to ./BeNTF2seq/design_with_PSSM/final_set
Then run:
cat final_set.tar.gz.part?? > final_set.tar.gz && tar -xzf final_set.tar.gz