uses the ESM-MSA model (a transformer-based neural network trained on protein multiple sequence alignments) to generate new protein sequences by iteratively mutating sequences from an input alignment.
Source data: AlphaFold2 predicted structures, Full sequence lists, Tables of metrics, Tables of experimental results, Phylogenetic Trees. Jupyter notebooks under "notebooks_for_figures" will automatically download the necessary data from Zenodo, but if you want it for some other purpose, it's available at this link.
protein_gibbs_sampler: command line tools for generating new sequences using ESM-MSA sampling (used in the notebook above).
References
Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” bioRxiv, March 4, 2023. https://doi.org/10.1101/2023.03.04.531015.