/RealisHuman

Code of RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

Primary LanguagePythonApache License 2.0Apache-2.0

RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

Benzhi Wang · Jingkai Zhou · Jingqi Bai · Yang Yang · Weihua Chen · Fan Wang · Zhen Lei

Paper PDF Project Page
CASIA   |   Alibaba


📢 News

  • 🔥🔥🔥 RealisHuman paper and project page released.
  • 🚀🚀🚀 Release training and inference code.
  • 👏👏👏 Now you can try more interesting AI video editing in XunGuang !!!
  • 🕺🕺🕺 You may also be interested in our human dance video generation method RealisDance.

🏃‍♂️ Getting Started

To begin, download the pretrained base models for RV-5-1, DINOv2, StableDiffusion V1.5, and StableDiffusion Inpainting.

Next, your can download our RealisHuman checkpoints in Baidu Cloud or Google Drive Part1, Google Drive Part2.

Organize the base models and checkpoints as follows:

mkdir checkpoint && mkdir pretrained_models

.
|-- LICENSE
|-- README.md
|-- assets
|-- data
|-- submodules
|   |-- 3DDFA-V3
|   |-- DWPose
|   `-- hamer-main
|-- realishuman
|-- configs
|-- checkpoint
|   |-- stage1_face
|   |   `-- checkpoint-stage1-face.ckpt
|   |-- stage1_hand
|   |   `-- checkpoint-stage1-hand.ckpt
|   |-- stage2_face
|   |   `-- checkpoint-stage2-face.ckpt
|   `-- stage2_hand
|       `-- checkpoint-stage2-hand.ckpt
|-- pretrained_models
|   |-- DINO
|   |   `-- dinov2
|   |-- RV
|   |   `-- rv-5-1
|   `-- StableDiffusion
|       |-- sd-1-5
|       `-- stable-diffusion-inpainting

⚒️ Installation

You can install the required environment using conda:

conda env create -f environment.yaml
conda activate RealisHuman

or with pip:

pip3 install -r requirements.txt

Additionally, you will need to set up environments for DWPose, HaMeR and 3DDFAv3. Please refer to their official setup guides for detailed configuration steps.

🚀 Training and Inference

Data Preparation

Structure your data directory as follows:

data
|-- images
|   |-- 3ddfa
|   |-- dwpose
|   |-- hamer
|   |-- image
|   `-- results

Use the following command to extract DWPose data:

cd submodules/DWPose
conda activate {YOUR_DWPose_Environment}
python ControlNet-v1-1-nightly/dwpose_infer_example.py --input_path {PATH_TO_IMAGE_DIR}/image --output_path {PATH_TO_SAVE_PKL}/dwpose

To refine generated images with malformed hands, estimate the hand meshes using HaMeR:

cd submodules/hamer-main
conda activate {YOUR_HaMeR_Environment}
python demo_image.py --img_folder {PATH_TO_IMAGE_DIR}/image --out_folder {PATH_TO_SAVE_HAMER}/hamer --full_frame

In case you encounter the error "AttributeError: 'NoneType' object has no attribute 'glGetError'", try the following:

apt-get install -y python-opengl libosmesa6

If you want to refine generated images with malformed faces, estimate the face meshes using 3DDFAv3:

cd submodules/3DDFA-V3
conda activate {YOUR_3DDFAv3_Environment}
python demo_dir.py --inputpath {PATH_TO_IMAGE_DIR}/image --savepath {PATH_TO_SAVE_3DDFA}/3ddfa --device cuda --iscrop 1 --detector retinaface --ldm68 0 --ldm106 0 --ldm106_2d 0 --ldm134 0 --seg_visible 0 --seg 0 --useTex 0 --extractTex 0 --backbone resnet50

Inference of RealisHuman

1. Hand Refining

Stage-One Pre-processing

To pre-process the hand data for stage-one, run the following command:

python data/process_hand_stage1.py

Stage-One Inference

After pre-processing, run the model to obtain the stage-one results:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
    inference_stage1.py --config configs/stage1-hand.yaml --output data/hand_example/hand_chip/repair \
    --ckpt checkpoint/stage1_hand/checkpoint-stage1-hand.ckpt

Stage-Two Processing and Inference

For stage-two, pre-process the hand data:

python data/process_hand_stage2.py

Then, run the model to obtain the stage-two results:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
    inference_stage2.py --config configs/stage2-hand.yaml --output data/hand_example/hand_chip/inpaint \
    --ckpt checkpoint/stage2_hand/checkpoint-stage2-hand.ckpt

Final Image Refinement

To paste the refined hand image back, execute:

python data/back_to_image_hand.py

Then, your can find the refined results in data/hand_example/hand_chip/results.


2. Face Refining

Stage-One Pre-processing

To pre-process the face data for stage-one, use the command:

python data/process_face_stage1.py

Stage-One Inference

Run the model to get the stage-one face results:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
    inference_stage1.py --config configs/stage1-face.yaml --output data/face_example/face_chip/repair \
    --ckpt checkpoint/stage1_face/checkpoint-stage1-face.ckpt

Stage-Two Processing and Inference

For stage-two, pre-process the face data:

python data/process_face_stage2.py

Run the model to get the stage-two results:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
    inference_stage2.py --config configs/stage2-face.yaml --output data/face_example/face_chip/inpaint \
    --ckpt checkpoint/stage2_face/checkpoint-stage2-face.ckpt

Final Image Refinement

To paste the refined face image back, run the following command:

python data/back_to_image_face.py

If you wish to integrate the refined results for faces and hands, run the following command:

python data/back_to_image_face.py --sub_dir results_hand

Then, your can find the refined results in data/face_example/face_chip/results.

Train of RealisHuman

You also can train the model with your own data with the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  torchrun --nnodes=1 --nproc_per_node=8 \
    train_stage1.py --config configs/stage1-xxx.yaml

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7  torchrun --nnodes=1 --nproc_per_node=8 \
    train_stage2.py --config configs/stage2-xxx.yaml 

🙏 Acknowledgements

We would like to thank the Animatediff and AnimateAnyone teams for their awesome codebases.

Citation

@misc{wang2024realishumantwostageapproachrefining,
      title={RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images}, 
      author={Benzhi Wang and Jingkai Zhou and Jingqi Bai and Yang Yang and Weihua Chen and Fan Wang and Zhen Lei},
      year={2024},
      eprint={2409.03644},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.03644}, 
}