_____________________.___.____ .____
\__ ___/\______ \ | | | |
| | | _/ | | | |
| | | | \ | |___| |___
|____| |____|_ /___|_______ \_______ \
\/ \/ \/
TRILL (TRaining and Inference using the Language of Life) is a sandbox for creative protein engineering and discovery. As a bioengineer myself, deep-learning based approaches for protein design and analysis are of great interest to me. However, many of these deep-learning models are rather unwieldy, especially for non ML-practitioners due to their sheer size. Not only does TRILL allow researchers to perform inference on their proteins of interest using a variety of models, but it also democratizes the efficient fine-tuning of large-language models. Whether using Google Colab with one GPU or a supercomputer with many, TRILL empowers scientists to leverage models with millions to billions of parameters without worrying (too much) about hardware constraints. Currently, TRILL supports using these models as of v1.3.0:
- ESM2 (Embed and Finetune all sizes, depending on hardware constraints doi. Can also generate synthetic proteins from finetuned ESM2 models using Gibbs sampling doi)
- ESM-IF1 (Generate synthetic proteins from .pdb backbone doi)
- ESMFold (Predict 3D protein structure doi)
- ProtGPT2 (Finetune and generate synthetic proteins from seed sequence doi)
- ProteinMPNN (Generate synthetic proteins from .pdb backbone doi)
- RFDiffusion (Diffusion-based model for generating synthetic proteins doi)
- DiffDock (Find best poses for protein-ligand binding doi)
- ProtT5-XL (Embed proteins into high-dimensional space doi)
- TemStaPro (Predict thermostability of proteins doi)
- ZymCTRL (Conditional language model for the generation of artificial functional enzymes link)
Check out the documentation and examples at https://trill.readthedocs.io/en/latest/index.html