This project aims to design chiral proteins using deep learning and Rosetta. The main steps include:
- Train a RoseTTAFold model on chiral protein structures.
- Design chiral protein sequences based on the trained model.
- Predict and optimize the structures of designed sequences.
- Evaluate the chirality of designed proteins.
- Create a new conda environment using the
environment.yml
file:conda env create -f environment.yml
- Activate the environment:
conda activate chiral_protein_design
- Install RoseTTAFold following the instructions at https://github.com/RosettaCommons/RoseTTAFold.
- Prepare the training data: Download chiral protein structures from the PDB database and place them in the data/pdb directory. Download the Sidechainnet dataset and extract it to the data/sidechainnet directory.
- Train the RoseTTAFold model:
python scripts/train_rosettafold.py
- Design chiral protein sequences:
python scripts/design_sequence.py
- Predict and optimize the structures of designed sequences:
python scripts/predict_structure.py python scripts/optimize_structure.py
- Evaluate the chirality of designed proteins:
python scripts/evaluate_chirality.py
- Run the entire pipeline:
python run_pipeline.py
- Analyze the results: View the designed sequences in results/designed_sequences.fasta. View the predicted and optimized structures in results/predicted_structures.pdb and results/optimized_structures.pdb. View the chirality evaluation results in results/chirality_scores.csv. Run the notebooks in the notebooks directory to explore the data and analyze the model performance.
data: Contains the training data (PDB files and Sidechainnet dataset).
models: Contains the trained RoseTTAFold model.
scripts: Contains the Python scripts for each step of the pipeline.
notebooks: Contains Jupyter notebooks for data exploration and model analysis.
results: Contains the output files generated by the pipeline.
environment.yml: Specifies the conda environment and dependencies.
README.md: Provides an overview and instructions for the project.
run_pipeline.py: Runs the entire pipeline.
- RoseTTAFold: https://github.com/RosettaCommons/RoseTTAFold
- Sidechainnet: https://github.com/jonathanking/sidechainnet
- PyRosetta: https://www.pyrosetta.org/
This project was inspired by the work of the RoseTTAFold team and the Sidechainnet dataset. Special thanks to the open-source community for their valuable contributions.
This project is licensed under the MIT License. See the LICENSE file for more information.