Benzhi Wang
·
Jingkai Zhou
·
Jingqi Bai
·
Yang Yang
·
Weihua Chen
·
Fan Wang
·
Zhen Lei
CASIA | Alibaba
|
- 🔥🔥🔥 RealisHuman paper and project page released.
- 🚀🚀🚀 Release training and inference code.
- 👏👏👏 Now you can try more interesting AI video editing in XunGuang !!!
- 🕺🕺🕺 You may also be interested in our human dance video generation method RealisDance.
To begin, download the pretrained base models for RV-5-1, DINOv2, StableDiffusion V1.5, and StableDiffusion Inpainting.
Next, your can download our RealisHuman checkpoints in Baidu Cloud or Google Drive Part1, Google Drive Part2.
Organize the base models and checkpoints as follows:
mkdir checkpoint && mkdir pretrained_models
.
|-- LICENSE
|-- README.md
|-- assets
|-- data
|-- submodules
| |-- 3DDFA-V3
| |-- DWPose
| `-- hamer-main
|-- realishuman
|-- configs
|-- checkpoint
| |-- stage1_face
| | `-- checkpoint-stage1-face.ckpt
| |-- stage1_hand
| | `-- checkpoint-stage1-hand.ckpt
| |-- stage2_face
| | `-- checkpoint-stage2-face.ckpt
| `-- stage2_hand
| `-- checkpoint-stage2-hand.ckpt
|-- pretrained_models
| |-- DINO
| | `-- dinov2
| |-- RV
| | `-- rv-5-1
| `-- StableDiffusion
| |-- sd-1-5
| `-- stable-diffusion-inpainting
You can install the required environment using conda:
conda env create -f environment.yaml
conda activate RealisHuman
or with pip
:
pip3 install -r requirements.txt
Additionally, you will need to set up environments for DWPose, HaMeR and 3DDFAv3. Please refer to their official setup guides for detailed configuration steps.
Structure your data directory as follows:
data
|-- images
| |-- 3ddfa
| |-- dwpose
| |-- hamer
| |-- image
| `-- results
Use the following command to extract DWPose data:
cd submodules/DWPose
conda activate {YOUR_DWPose_Environment}
python ControlNet-v1-1-nightly/dwpose_infer_example.py --input_path {PATH_TO_IMAGE_DIR}/image --output_path {PATH_TO_SAVE_PKL}/dwpose
To refine generated images with malformed hands, estimate the hand meshes using HaMeR:
cd submodules/hamer-main
conda activate {YOUR_HaMeR_Environment}
python demo_image.py --img_folder {PATH_TO_IMAGE_DIR}/image --out_folder {PATH_TO_SAVE_HAMER}/hamer --full_frame
In case you encounter the error "AttributeError: 'NoneType' object has no attribute 'glGetError'", try the following:
apt-get install -y python-opengl libosmesa6
If you want to refine generated images with malformed faces, estimate the face meshes using 3DDFAv3:
cd submodules/3DDFA-V3
conda activate {YOUR_3DDFAv3_Environment}
python demo_dir.py --inputpath {PATH_TO_IMAGE_DIR}/image --savepath {PATH_TO_SAVE_3DDFA}/3ddfa --device cuda --iscrop 1 --detector retinaface --ldm68 0 --ldm106 0 --ldm106_2d 0 --ldm134 0 --seg_visible 0 --seg 0 --useTex 0 --extractTex 0 --backbone resnet50
To pre-process the hand data for stage-one, run the following command:
python data/process_hand_stage1.py
After pre-processing, run the model to obtain the stage-one results:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
inference_stage1.py --config configs/stage1-hand.yaml --output data/hand_example/hand_chip/repair \
--ckpt checkpoint/stage1_hand/checkpoint-stage1-hand.ckpt
For stage-two, pre-process the hand data:
python data/process_hand_stage2.py
Then, run the model to obtain the stage-two results:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
inference_stage2.py --config configs/stage2-hand.yaml --output data/hand_example/hand_chip/inpaint \
--ckpt checkpoint/stage2_hand/checkpoint-stage2-hand.ckpt
To paste the refined hand image back, execute:
python data/back_to_image_hand.py
Then, your can find the refined results in data/hand_example/hand_chip/results.
To pre-process the face data for stage-one, use the command:
python data/process_face_stage1.py
Run the model to get the stage-one face results:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
inference_stage1.py --config configs/stage1-face.yaml --output data/face_example/face_chip/repair \
--ckpt checkpoint/stage1_face/checkpoint-stage1-face.ckpt
For stage-two, pre-process the face data:
python data/process_face_stage2.py
Run the model to get the stage-two results:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
inference_stage2.py --config configs/stage2-face.yaml --output data/face_example/face_chip/inpaint \
--ckpt checkpoint/stage2_face/checkpoint-stage2-face.ckpt
To paste the refined face image back, run the following command:
python data/back_to_image_face.py
If you wish to integrate the refined results for faces and hands, run the following command:
python data/back_to_image_face.py --sub_dir results_hand
Then, your can find the refined results in data/face_example/face_chip/results.
You also can train the model with your own data with the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
train_stage1.py --config configs/stage1-xxx.yaml
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nnodes=1 --nproc_per_node=8 \
train_stage2.py --config configs/stage2-xxx.yaml
We would like to thank the Animatediff and AnimateAnyone teams for their awesome codebases.
@misc{wang2024realishumantwostageapproachrefining,
title={RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images},
author={Benzhi Wang and Jingkai Zhou and Jingqi Bai and Yang Yang and Weihua Chen and Fan Wang and Zhen Lei},
year={2024},
eprint={2409.03644},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.03644},
}