/Glyph-ByT5

This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering"

Primary LanguageJupyter Notebook

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

This is the official implementation of Glyph-ByT5 and Glyph-ByT5-v2, introduced in Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering and Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering . This repo contains the training code for glyph-alignment pretraining and inference code for our proposed Glyph-SDXL and Glyph-SDXL-v2 model.

News

⛽ ⛽ ⛽ Contact: yuhui.yuan@microsoft.com

  • 2024.06.17 Release the checkpoints and codes of Glyph-ByT5 and Glyph-ByT5-v2.

🔆 Highlights

  • We identify two crucial requirements of text encoders for achieving accurate visual text rendering: character awareness and alignment with glyphs. To this end, we propose a customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset.

  • We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than 20% to nearly 90% on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts.

  • We deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in $\sim10$ different languages

paragraph example 1 paragraph example 2 paragraph example 3 paragraph example 4
design example 1 design example 2 design example 3 design example 4
scene example 1 scene example 2 scene example 3 scene example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4
multilingual example 1 multilingual example 2 multilingual example 3 multilingual example 4

🔧 Usage

For a detailed guide on Glyph-SDXL and Glyph-SDXL-v2 inference, see this folder.

For a detailed guide on Glyph-ByT5 alignment pretraining, see this folder.

🔓 Available Checkpoints

  • Glyph-SDXL can be found here.
  • Glyph-SDXL-v2 can be found here.
  • Glyph alignment pretraining data can be found here.

📬 Citation

If you find this code useful in your research, please consider citing:

@misc{liu2024glyphbyt5,
    title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering},
    author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan},
    year={2024},
    eprint={2403.09622},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

and

@misc{liu2024glyphbyt5v2,
    title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, 
    author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan},
    year={2024},
    eprint={2406.10208},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}