Refer to our paper in Bioinformatics for more details.
- Collection of jupyter notebook scripts demonstrating various aspects of pipeline.
- Conda enviornments required to run pipeline and jupyter notebooks are located in conda_yml.
- seq_struct_func.yml for steps 1,5-7
- alphafold2.yml for step 2
- build environnments with
conda env create -f X.yml
- steps 3 and 4 require pyCHARMM and MMTSB to be installed in seq_struc_func
- Recommended resources: 1 GPU with 10 GB memory and 1-4 CPUs
- Scripts are listed in the order they should be run.
- asr_seq_annotations.xlsx
- All enzymes, sequences, and annotations from structure-function pipeline
- extant_msa.fasta
- Multiple sequence alignment used previously to construct ancestral sequence resurrects
- fasta/
- Sequences in asr_seq_annotations.xlsx written as fasta format
- pdb_with_fad/
- Directory containing all AlphaFold2 models with FAD cofactor
- top_dock_pose/
- Directory cotaining lowest energy poses from minimization in explicit protein
- log_reg_models/
- Pretrained statsmodels logistic regression models
- script/gen_consensus_db.ipynb
- Create database of consensus sequence hits from AlphaFold2 MSAs
- script/run_alphafold_consensus.ipynb
- Run example protein with AlphaFold2 using consensus sequence hits
- script/fad.ipynb
- Add FAD cofactor into generated example protein
- script/fftdock.ipynb
- Use CHARMM Fast Fourier Transform Docking to get initial positions of ligand
- script/prot_min.ipynb
- Refine FFT poses in explicit protein representation
- script/cluster.ipynb
- Cluster poses to select representative poses
- script/stereo.ipynb
- Predict stereochemistry from boltzmann weighted representative poses
- script/reactivity.ipynb
- Predict reactivity from pose features
- script/vis_pred.ipynb
- Visuallize predicted poses
- script/gen_msa.ipynb
- Generate Multiple Sequence Alignment
- script/get_bs_ss_residues.ipynb
- Get set of binding site and second shell residues
- script/slice_msa.ipynb
- Modify MSA to be limited to binding site and second shell residues
- script/run_automl.ipynb
- Fit multiple sequence alignment to predicted stereochemistry labels with gradient boosted trees and random forest models
- script/shap_analysis.ipynb
- Calculate SHAP values for residues and visuallize how residues affect stereochemistry