/UDiffText

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Primary LanguagePythonMIT LicenseMIT

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Our proposed UDiffText is capable of synthesizing accurate and harmonious text in either synthetic or real-word images, thus can be applied to tasks like scene text editing (a), arbitrary text generation (b) and accurate T2I generation (c)

UDiffText Teaser

๐Ÿ“ฌ News

  • 2023.7.16 Our paper is accepted by ECCV2024!๐Ÿฅณ
  • 2023.12.11 Version 2.0 update (getting rid of trash codes๐Ÿšฎ)
  • 2023.12.3 Build Hugging Face demo
  • 2023.12.1 Build Github project page
  • 2023.11.30 Version 1.0 upload

๐Ÿ”จ Installation

  1. Clone this repo:
git clone https://github.com/ZYM-PKU/UDiffText.git
cd UDiffText
  1. Install required Python packages
conda create -n udiff python=3.11
conda activate udiff
pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
  1. Make the checkpoint directory and build the tree structure
mkdir ./checkpoints

checkpoints
โ”œโ”€โ”€ AEs                    // AutoEncoder
โ”œโ”€โ”€ encoders             
    โ”œโ”€โ”€ LabelEncoder       // Character-level encoder
    โ””โ”€โ”€ ViTSTR             // STR encoder
โ”œโ”€โ”€ predictors             // STR model
โ”œโ”€โ”€ pretrained             // Pretrained SD
โ””โ”€โ”€ ***.ckpt               // UDiffText checkpoint

๐Ÿ’ป Training

  1. Prepare your data

LAION-OCR

  • Create a data directory {your data root}/LAION-OCR in your disk and put your data in it. Then set the data_root field in ./configs/dataset/locr.yaml.
  • For the downloading and preprocessing of Laion-OCR dataset, please refer to TextDiffuser and our ./scripts/preprocess/laion_ocr_pre.ipynb.

ICDAR13

  • Create a data directory {your data root}/ICDAR13 in your disk and put your data in it. Then set the data_root field in ./configs/dataset/icd13.yaml.
  • Build the tree structure as below:
ICDAR13
โ”œโ”€โ”€ train                  // training set
    โ”œโ”€โ”€ annos              // annotations
        โ”œโ”€โ”€ gt_x.txt
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ images             // images
        โ”œโ”€โ”€ img_x.jpg
        โ”œโ”€โ”€ ...
โ””โ”€โ”€ val                    // validation set
    โ”œโ”€โ”€ annos              // annotations
        โ”œโ”€โ”€ gt_img_x.txt
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ images             // images
        โ”œโ”€โ”€ img_x.jpg
        โ”œโ”€โ”€ ...

TextSeg

  • Create a data directory {your data root}/TextSeg in your disk and put your data in it. Then set the data_root field in ./configs/dataset/tsg.yaml.
  • Build the tree structure as below:
TextSeg
โ”œโ”€โ”€ train                  // training set
    โ”œโ”€โ”€ annotation         // annotations
        โ”œโ”€โ”€ x_anno.json    // annotation json file
        โ”œโ”€โ”€ x_mask.png     // character-level mask
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ image              // images
        โ”œโ”€โ”€ x.jpg.jpg
        โ”œโ”€โ”€ ...
โ””โ”€โ”€ val                    // validation set
    โ”œโ”€โ”€ annotation         // annotations
        โ”œโ”€โ”€ x_anno.json    // annotation json file
        โ”œโ”€โ”€ x_mask.png     // character-level mask
        โ”œโ”€โ”€ ...
    โ””โ”€โ”€ image              // images
        โ”œโ”€โ”€ x.jpg
        โ”œโ”€โ”€ ...

SynthText

  • Create a data directory {your data root}/SynthText in your disk and put your data in it. Then set the data_root field in ./configs/dataset/st.yaml.
  • Build the tree structure as below:
SynthText
โ”œโ”€โ”€ 1                      // part 1
    โ”œโ”€โ”€ ant+hill_1_0.jpg   // image
    โ”œโ”€โ”€ ant+hill_1_1.jpg
    โ”œโ”€โ”€ ...
โ”œโ”€โ”€ 2                      // part 2
โ”œโ”€โ”€ ...
โ””โ”€โ”€ gt.mat                 // annotation file
  1. Train the character-level encoder

Set the parameters in ./configs/pretrain.yaml and run:

python pretrain.py
  1. Train the UDiffText model

Download the pretrained model and put it in ./checkpoints/pretrained/. You can ignore the "Missing Key" or "Unexcepted Key" warning when loading the checkpoint.

Set the parameters in ./configs/train.yaml, especially the paths:

load_ckpt_path: ./checkpoints/pretrained/512-inpainting-ema.ckpt // Checkpoint of the pretrained SD
model_cfg_path: ./configs/train/textdesign_sd_2.yaml // UDiffText model config
dataset_cfg_path: ./configs/dataset/locr.yaml // Use the Laion-OCR dataset

and run:

python train.py

๐Ÿ“ Evaluation

  1. Download our available checkpoints and put them in the corresponding directories in ./checkpoints.

  2. Set the parameters in ./configs/test.yaml, especially the paths:

load_ckpt_path: "./checkpoints/***.ckpt"  // UDiffText checkpoint
model_cfg_path: "./configs/test/textdesign_sd_2.yaml"  // UDiffText model config
dataset_cfg_path: "./configs/dataset/locr.yaml"  // LAION-OCR dataset config

and run:

python test.py

๐Ÿ–ผ๏ธ Demo

In order to run an interactive demo on your own machine, execute the code:

python demo.py

or try our online demo at hugging face:

Demo

๐ŸŽ‰ Acknowledgement

  • Dataset: We sincerely thank the open-source large image-text dataset LAION-OCR with character-level segmentations provided by TextDiffuser.

  • Code & Model: We build our project based on the code repo of Stable Diffusion XL and leverage the pretrained checkpoint of Stable Diffusion 2.0.

๐Ÿชฌ Citation

@misc{zhao2023udifftext,
      title={UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models}, 
      author={Yiming Zhao and Zhouhui Lian},
      year={2023},
      eprint={2312.04884},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}