data/
: contains datasets for all targets presented in manuscript- *.csv: sequence data with target enrichment labeled by train/test splits
- *.pkl: easy loading pickle file with train/test splits for training
weights/
: output folder for trained weights. Contains all trained weights for models used in Dai & Saksena et al 2022.- TrainModel.ipynb: notebook containing functions for training scoring models for later use in SAPS
- encoding.pkl: sequence encoding used for training
- GenerateLibrary.ipynb: notebook containing functions to generate SAPS-designed library
- The following instructions require access to GPUs with CUDA support. If a user does not have access, please use the Google Colab notebooks provided below to run SAPS.
Using conda:
conda create -n factorizable
conda activate factorizable
conda install --file requirements.txt
Using pip:
pip install -r requirements.txt
After installing dependencies, open Jupyter notebook:
cd FactorizableLibrary/
jupyter notebook model_training/TrainModel.ipynb
Follow instructions in the notebook to run each cell and train a model. You can use one of the datasets provided in data/
, or you can generate your own *.pkl data file and replace the path in the execution portion of the notebook.
Once you have trained a model, it should be stored in weights/
. Next, open the script for library generation.
jupyter notebook library_generation/GenerateLibrary.ipynb
Follow instructions in the notebook to run each cell and generate a SAPS library. You can change the path to different model weights files to change the optimization objective of the library. You can also adjust the entropy parameter as specified in the notebook.
We have provided demo notebooks in which example models can be trained using pu
blicly available Google GPUs for illustrative purposes. These can be accessed h
ere: