UniST : Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers (ICCV2023)
Authors: Bohai Gu, Heng Fan, Libo Zhang
This repository is the official pytorch implementation of Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers.
Which proposes an unified style transfer framework, dubbed UniST, for arbitrary image and video style transfers, in which two tasks can benefit from each other to improve the performance.
The proposed network leverages the local and long-range dependencies jointly. More specifically, UniST first applies CNNs to generate tokens, and then models long-range dependencies to excavate domain-specific information with domain interaction transformer (DIT). Afterwards, DIT sequentially interacts contextualized domain information for joint learning.
Both images and videos are provided with finest grainuarity style transfer results.
Except the arbitrary image and video style transfers, UniST provides the multi-granularity style transfer.
Style resolutions are 1024x1024, 512x512, 256x256, respectively.
Compared with some state-of-the-art algorithms, our method has a strong ability to generate finest grainuarity results with better feature representation. (Some of the SOATs are not supported for multi-granularity style transfer.)
- python 3.6
- pytorch 1.6.0
- torchvision 0.4.2
- PIL, einops, matplotlib
- tqdm
Please download Pretrained models and put into the floder ./weight.
Please configure paramters in ./option/test_options.py. And set the pretrained checkpoint in ./models/model.py.
For multi_granularity and single_modality , please refer to the scripts in ./application.
python scripts/inference.py
Pretrained models: vgg_r41.pth, dec_r41.pth, vgg_r51.pth.
Please download them and put into the floder ./weight.
Style dataset is WikiArt collected from WIKIART.
Content dataset is COCO dataset for image, and MPI dataset or DAVIS for video.
Please configure paramters in ./option/train_options.py
python scripts/train.py
If this repo is useful to you, please cite our technical paper.
@InProceedings{Gu_2023_ICCV,
author = {Gu, Bohai and Fan, Heng and Zhang, Libo},
title = {Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {23545-23554}
}
We would like to express our gratitude for the contributions of several previous works to the implementation of UniST. This includes, but is not limited to pixel2style2pixel ,attention-is-all-you-need.