The official code of "Towards anchoring evolutionary fitness for protein stability with virtual chemical environment recovery"
Protein stability offers valuable insights into protein folding and functionality, making it an integral component of evolutionary fitness. Previous computational methods possess both strengths and weaknesses, leading to practical and inter- pretational limitations. Here, we propose an interpretable protein stability change prediction method, S3C, to anchor evolutionary fitness for protein stability with virtual chemical environment recovery. S3C first gets rid of the shackles of high-resolution protein structure data and restores the local chemical environments of the mutations at the sequence level. Subsequently, S3C promotes the evolutionary fitness of protein stability to dominate the fitness landscape under the selective pressure. Naturally, S3C comprehensively outperforms state-of-the-art methods on benchmark datasets while showing ideal generalization when migrated to unseen protein families. More importantly, S3C is demonstrated to be interpretable at multiple scales, including high-fidelity recovery of local structure micro-environment, perception of intricate interaction reconstruction, and accurate mining of rare beneficial mutations. S3C expands the boundaries of protein evolution prediction and provides an ideal candidate for large-scale optimization of protein engineering.
The experiments are tested on one Tesla V100 (32GB).
Build the environment.
pip install -r requirements.txt
Training and testing data are in the "data" folder. Download mega-scale dataset for training at mega-scale dataset
Download the checkpoint of S3C and modify the paths in the code.
Content | Link |
---|---|
Checkpoint on S6070 | link |
To test S3C on different test datas, please run
python test.py
To train S3C on downstream tasks from scratch, please run
python train.py
This project is licensed under the MIT License.