/MoLE

An official pytorch implementation of "MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts"

Primary LanguagePythonOtherNOASSERTION

MoLE: Enhancing Human-centric Text-to-image Diffusion via Mixture of Low-rank Experts

NeurIPS 2024

Project homepage || Paper || Model || Human-centric Dataset [code: asd4]

Jie Zhu, Yixiong Chen, Mingyu Ding, Ping Luo, Leye Wang†, Jingdong Wang
Peking University, Johns Hopkins University, UC Berkeley, The University of Hong Kong, Baidu

Introduction

This is an official implementation of MoLE, which is a human-centric text-to-image diffusion model. We provide the code for SD v1.5 and SDXL, respectively.

Requirements

Pleae see requirements.txt. We provide the xformers file used in our environment in here

Data Preparation

Download the Human-centric Dataset [code: asd4].

This dataset involves three subsets:human-in-the-scene images, close-up of face images, and close-up of hand images, totally one million images. Moreover these images possess superior quality and boasts high aesthetic scores.

We also provide the scripts of downloading raw images from corresponding websites. See directory ./climb_scripts

NOTE: Our dataset is allowed for academic purposes only. When using it, the users are requested to ensure compliance with legal regulations. See LICENSE.txt for details.

If it is helpful, please give us a star and cite our paper. Thanks!

Ackowledgement

We thank the authors of XFormer for providing us with a great library. Our code is based on sd-scripts. Thank the authors. We also thank Stability.ai for its open source.