/rapidae

Explore, compare and develop autoencoder models with a back-end agnostic framework

Primary LanguagePythonApache License 2.0Apache-2.0

Rapidae: Python Library for Rapid Creation and Experimentation of Autoencoders

Documentation Status License made-with-python

🔗 Documentation | 🔗 PyPI Package

Description 📕

Rapidae is a Python library specialized in simplifying the creation and experimentation of autoencoder models. With a focus on ease of use, this library allows users to explore and develop autoencoder models in an efficient and straightforward manner.

I decided to develop this library to optimize my research workflow and provide a comprehensive resource for educators and learners exploring autoencoders.

As a researcher, I often found myself spending time on repetitive tasks, such as creating project structures or replicating baseline models. (I've lost count of how many times I've gone through the Keras VAE tutorial just to copy the model as a baseline for other experiments.)

As an educator, despite recognizing numerous fantastic online resources, I felt the need for a place where the features I consider important for teaching these models are consolidated: explanation, implementation, and versatility across different backends. The latter is particularly crucial, considering that PyTorch practitioners may find tedious to switch to TensorFlow, and vice versa. With the recently released Keras 3, Rapidae ensures that the user is met with a seamless and engaging experience, enabling to focus on model creation rather than backend specifics.

In summary, this library is designed to be simple enough for educational purposes, yet robust for researchers to concentrate on developing their models and conducting benchmark experiments in a unified environment.

Note

Shout out to Pythae, which provides an excellent library for experimenting with VAEs . If you're looking for a quick way to implement autoencoders for image applications, Pythae is probably your best option. Rapidae differs from Pythae in the following ways:

  • It is built on Keras 3, allowing you to experiment with and provide your implementations in either PyTorch, TensorFlow, or JAX.
  • The image models implemented in Rapidae are primarily designed for educational purposes.
  • Rapidae is intended to serve as a benchmarking library for models implemented in the sequential/time-series domain, as these are widely dispersed across various fields.

🚨Call for contributions🚨

If you want to add your model to the package or collaborate in the package development feel free to shoot me a message at costanahuel@uniovi.es or just open an issue or a pull request. I´ll be happy to collaborate with you.

Quick access:

Main features

  • Ease of Use: Rapidae has been designed to make the process of creating and experimenting with autoencoders as simple as possible, users can create and train autoencoder models with just a few lines of code.

  • Backend versatility: Rapidae relies on Keras 3.0, which is backend agnostic, allowing switching indistinctly between Tensorflow, Pytorch or Jax.

  • Customization: Easily customize model architecture, loss functions, and training parameters to suit your specific use case.

  • Experimentation: Conduct experiments with different hyperparameters and configurations to optimize the performance of your models.

Overview

Rapidae is structured as follows:

  • data: This module contains everything related to the acquisition and preprocessing of datasets.

  • models: This is the core module of the library. It includes the base architectures on which new ones can be created, several predefined architectures and a list of predefined default encoders and decoders.

  • pipelines: Pipelines are designed to perform a specific task or set of tasks such as data preprocessing or model training.

  • evaluate: Its main functionality is the evaluation of model performance. It also includes a tool utils for various tasks: latent space visualization, reconstructions, evaluation, etc.

Installation

The library has been tested with Python versions >=3.10, <3.12, therefore we recommend first creating a virtual environment with a suitable python version. Here´s an example with conda:

conda create -n rapidae python=3.10

Then, just activate the environment with conda activate rapidae and install the library.

Note

If you are using Google Colab, you are good to go (i.e. you do not need to create an environment). The library is fully compatible with Colab´s default environment.

With Pip

To install the latest stable release of this library run the following:

  pip install rapidae

Note that you will also need to install a backend framework. Here are the official installation guidelines:

Important

If you install TensorFlow, you should reinstall Keras 3 afterwards via pip install --upgrade keras. This is a temporary step while TensorFlow is pinned to Keras 2, and will no longer be necessary after TensorFlow 2.16. The cause is that tensorflow==2.15 will overwrite your Keras installation with keras==2.15.

From source code

You can also clone the repo to have fully access to all the code. Some features may not yet be available in the published stable version so this is the best way to stay up-to-date with the latest updates.

git clone https://github.com/NahuelCostaCortez/rapidae
cd rapidae

Then you only have to install the requirements:

pip install -r requirements.txt

Available Models

Below is the list of the models currently implemented in the library.

Models Training example Paper Official Implementation
Autoencoder (AE) Open In Colab link
Beta Variational Autoencoder (BetaVAE) Open In Colab link
Contractive Autoencoder Open In Colab link
Denoising Autoencoder Open In Colab link link
Hierarchical Variational Autoencoder (HVAE) SOON link link
ICFormer SOON link link
interval-valued Variational Autoencoder (iVAE) IN PROGRESS
Recurrent Variational AutoEncoder (RVAE) Open In Colab link link
Recurrent Variational Encoder (RVE) Open In Colab link link
Sparse Autoencoder Open In Colab link
Time VAE Open In Colab link
Variational Autoencoder (VAE) Open In Colab link link
Vector Quantised-Variational AutoEncoder (VQ-VAE) Open In Colab link link

|

Usage

Here you have a simple tutorial with the most relevant aspects of the library. In addition, in the examples folder, you will find a series of notebooks for each model and with particular use cases.

You can also use a web interface made with Streamlit where you can load datasets, configure models and hypeparameters, train, and evaluate the results. Check the web interface notebook.

Custom models and loss functions

You can provide your own autoencoder architecture. Here´s an example for defining a custom encoder and a custom decoder:

from rapidae.models.base import BaseEncoder, BaseDecoder
from keras.layers import Dense

class Custom_Encoder(BaseEncoder):
    def __init__(self, input_dim, latent_dim, **kwargs): # you can add more arguments, but al least these are required
        BaseEncoder.__init__(self, input_dim=input_dim, latent_dim=latent_dim)

        self.layer_1 = Dense(300)
        self.layer_2 = Dense(150)
        self.layer_3 = Dense(self.latent_dim)

    def call(self, x):
        x = self.layer_1(x)
        x = self.layer_2(x)
        x = self.layer_3(x)
        return x
class Custom_Decoder(BaseDecoder):
    def __init__(self, input_dim, latent_dim, **kwargs): # you can add more arguments, but al least these are required
        BaseDecoder.__init__(self, input_dim=input_dim, latent_dim=latent_dim)

        self.layer_1 = Dense(self.latent_dim)
        self.layer_2 = Dense(self.input_dim)

    def call(self, x):
        x = self.layer_1(x)
        x = self.layer_2(x)
        return x

You can also provide a custom model. This is specially useful if you want to implement your own loss function.

from rapidae.models.base import BaseAE
from keras.ops import mean
from keras.losses import mean_squared_error

class CustomModel(BaseAE):
    def __init__(self, input_dim, latent_dim, encoder, decoder):
        # If you are adding your model to the source code there is no need to specify the encoder and decoder, just place them in the same directory as the model and the BaseAE constructor will initialize them
        BaseAE.__init__(
            self,
            input_dim=input_dim,
            latent_dim=latent_dim,
            encoder=encoder,
            decoder=decoder
        )
        
    def call(self, x):
        # IMPLEMENT FORWARD PASS
        x = self.encoder(x)
        x = self.decoder(x)

        return x
      
    def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None):
        '''
        Computes the loss of the model.
        x: input data
        y: target data
        y_pred: predicted data (output of call)
        sample_weight: Optional array of the same length as x, containing weights to apply to the model's loss for each sample
        '''
        # IMPLEMENT LOSS FUNCTION
        loss = mean(mean_squared_error(x, y_pred))

        return loss

Switching backends

Since Rapidae uses Keras 3, you can easily switch among Tensorflow, Pytorch and Jax (Tensorflow is the selected option by default).

You can export the environment variable KERAS_BACKEND or you can edit your local config file at ~/.keras/keras.json to configure your backend. Available backend options are: "jax", "tensorflow", "torch". Example:

export KERAS_BACKEND="torch"

In a notebook, you can do:

import os
os.environ["KERAS_BACKEND"] = "torch" 
import keras

Experiment tracking with wandb

If you want to add experiment tracking to rapidae models you can just create a Wandb callback and pass it to the TrainingPipeline as follows (this also applies to other experiment tracking frameworks):

wandb_cb = WandbCallback()

wandb_cb.setup(
    training_config=your_training_config,
    model_config=your_model_config,
    project_name="your_wandb_project",
    entity_name="your_wandb_entity",
)

pipeline = TrainingPipeline(name="you_pipeline_name", 
                            model=model,
                            callbacks=[wandb_cb])

Documentation

Check out the full documentation for detailed information on installation, usage, examples and recipes: 🔗 Documentation Link

All documentation source and configuration files are located inside the docs directory.

Dealing with issues

If you are experiencing any issues while running the code or request new features/models to be implemented please open an issue on github.

Citation

If you find this work useful or incorporate it into your research, please consider citing it 🙏🏻.

@software{Costa_Rapidae,
author = {Costa, Nahuel},
license = {Apache-2.0},
title = {{Rapidae}},
url = {https://github.com/NahuelCostaCortez/rapidae}
}