/GTM_WAE

Primary LanguageJupyter NotebookMIT LicenseMIT

GTM_WAE: Cartography-guided de novo generation of peptides with desired properties

GTM_WAE is a Python package of the Wasserstein Autoencoder (WAE) with attention layers in encoder and a collection of notebooks adapted for the use with non-linear dimensionality reduction method - Generative Topographic Mapping (GTM). It uses the map built on WAE latent vectors to visualize complex multidimensional latent space in 2D, making it easily explorable by human eye. The maps serve as guides to select zones to sample latent vectors that would be decoded to peptides with desired properties with high probability.

simplified_pipeline


License

Here are some key features of GTM_WAE for peptide generation:

  • πŸ—ΊοΈ Peptide Space Visualization: Visualize the latent space in a form of 2D maps easily interpretable by human eye.
  • πŸ”¬ Property analysis: Colour the maps according to any property and locate the clusters of peptides with particular properties.
  • πŸ“Š Motif analysis: Identify predominant peptide motifs important for a property presence in a peptide cluster.
  • πŸš€ Explainable de novo generation: Use map zones populated with peptides with desired properties for the de novo generation of analogues.
  • πŸ’Š Multiple properties constrained generation: Colour maps according to various properties (e.g., activity, cytotoxicity, etc.) to perform multi-property constrained generation.
  • πŸ” Library comparison: Compare different libraries or databases to analyze their diversity and coverage.

Data availability

The publicly available data used for WAE training and GTM creation is available on Hugging Face Hub πŸ€—: Peptide data for GTM_WAE

Installation

Setting Up Your Environment

Before installing GTM_WAE, ensure that your system has Python installed, with a version less than 3.12. For managing Python environments and dependencies, it is recommended to use Conda or Miniforge.

Create and activate a Conda environment:

conda create -n gtm_wae_env python=3.10 -c conda-forge
conda activate gtm_wae_env

Cloning the Repository

With git:

git clone https://github.com/Laboratoire-de-Chemoinformatique/GTM_WAE.git

Installation

Install GTM_WAE using pip after activating your environment:

cd GTM_WAE/
pip install -e .

Adding Environment to Jupyter

To use GTM_WAE within Jupyter notebooks, you'll need to add your Conda or virtual environment as a kernel:

python -m ipykernel install --user --name=gtm_wae_env --display-name="GTM_WAE"

This will allow you to select this environment inside Jupyter as a kernel

Updating to a Newer Version

To update GTM_WAE to the latest version:

  1. Go to the folder where GTM_WAE was cloned:

    cd GTM_WAE/
  2. Pull the new version with git:

    git pull

    You will need to specify your login and access token.

If you did not install GTM_WAE with the -e option, you would also need to manually update it in your environment:

  1. Activate your environment:

    conda activate gtm_wae_env
  2. Install the package:

    pip install .

Installation for Developers

Developers should install dependencies via Poetry, which can also be managed through Conda:

conda install poetry -c conda-forge
poetry install

If you encounter any issues with Poetry related to environment variables, add the following line to your ~/.bashrc file:

echo 'export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring' >> ~/.bashrc
exec bash

Main developers