Authors: Filippo Botti, Alex Ergasti, Leonardo Rossi, Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi and Andrea Prati
This repository is the official implementation of Mamba-ST: State Space Model for Efficient Style Transfer.
This paper explores a novel design of Mamba, called Mamba-ST, to perform style transfer.
Examples of generated images from our Mamba model given a style and a content image.a) Mamba-ST full architecture. It takes as input a content and a style image and generates the content image stylized as the style image. b) Mamba encoder with an additional skip connection (rightmost). c) Our Mamba-ST Decoder, which takes both style and content as input. In particular, style embeddings are shuffled before passing to ST-VSSM in order to loose spatial information, maintaining only higher level information. d) The inner architecture of the Base VSSM. e) The inner architecture of the Base 2D-SSM. f) Our ST-VSSM. Notably, DWConv is shared among content and style embedding. g) Our modified ST 2D-SSM, where the matrices A, B and Delta are computed from the style, the input of the selective scan are the style embedding and the matrix C is calculated using the content.
In order to run the project please install the environment by following these commands:
conda create -n mambast
pip install -r requirements.txt
conda activate mambast
You can find the random images used in order to generated the results inside ./data folder. Please modify all the .sh files with the correct path for your checkpoints and images before running the following instructions.
[Pretrained models] (https://drive.google.com/drive/folders/1pVhJFwk2f3arP7zUDFAe5_PJrPSG1gc2?usp=drive_link)
sh scripts/eval.sh
# Before executing evalution code in order to calculate the metrics,
# please duplicate the content and style images to match the number of stylized images first.
# (40 styles, 20 contents -> 800 style images, 800 content images)
python evaluation/copy_inputs.py --cnt PATH_FOR_CONTENT_IMAGES --sty PATH_FOR_STYLE_IMAGES
sh evaluation/eval.sh
sh scripts/test.sh
Style dataset is WikiArt collected from WIKIART
content dataset is COCO2014
sh scripts/train.sh
The full model (fig. 2(a)) can be found at MambaST.py. In this file you can find the whole architecture.
The Mamba Encoder/Decoder (fig. 2 (b) and fig. 2 (c)) module can be found at mamba.py
Finally, our VSSM's implementation (both with a single input and with two input merged for style transfer) can be found at mamba_arch.py. If you want you can also find VSSM with different scans direction inside single_direction_mamba_arch.py and double_direction_mamba_arch.py.
If you find our work useful in your research, please cite our paper using the following BibTeX entry ~ Thank you ^ . ^. Paper Link pdf
@misc{botti2024mambaststatespacemodel,
title={Mamba-ST: State Space Model for Efficient Style Transfer},
author={Filippo Botti and Alex Ergasti and Leonardo Rossi and Tomaso Fontanini and Claudio Ferrari and Massimo Bertozzi and Andrea Prati},
year={2024},
eprint={2409.10385},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.10385},
}