The Wav2Lip node is a custom node for ComfyUI that allows you to perform lip-syncing on videos using the Wav2Lip model. It takes an input video and an audio file and generates a lip-synced output video.
- Lip-syncing of videos using the Wav2Lip model
- Face detection and enhancement using GFPGAN or CodeFormer
- Adjustable fidelity for face enhancement
- Support for various face detection models
images
: Input video frames (required)audio
: Input audio file (required)mode
: Processing mode, either "sequential" or "repetitive" (default: "sequential")face_detect_batch
: Batch size for face detection (default: 8)facedetection
: Face detection model, options: "retinaface_resnet50", "retinaface_mobile0.25", "YOLOv5l", "YOLOv5n" (default: "retinaface_resnet50")face_restore
: Enable or disable face enhancement, options: "enable", "disable" (default: "disable")codeformer_fidelity
: Fidelity for face enhancement, range: 0.0 to 1.0 (default: 0.5)facerestore_model
: Face restoration model, options: "CodeFormer.pth", "GFPGAN.pth" (default: "CodeFormer.pth")
images
: Lip-synced output video framesaudio
: Output audio file
-
Clone the repository to custom_nodes folder:
git clone https://github.com/yourusername/wav2lip-comfyui.git
-
Install the required dependencies:
pip install -r requirements.txt
To use the Wav2Lip node, you need to download the required models separately. Please follow these steps:
- Download the CodeFormer & GFPGAN models: CodeFormer model | GFPGAN model
- Place the
.pth model files in the
custom_nodes\ComfyUI_wav2lip\models\facerestore_models` folder
- Download the facedetection models: -1- |-2- |-3- |-4-
- Place the
.pth model files in the
custom_nodes\ComfyUI_wav2lip\models\facedetection` folder
- Download the wav2lip model: -1-
- Place the
.pth model file in the
custom_nodes\ComfyUI_wav2lip\Wav2Lip\checkpoints` folder - Start or restart ComfyUI.
-
Add the Wav2Lip node to your ComfyUI workflow.
-
Connect the input video frames and audio file to the corresponding inputs of the Wav2Lip node.
-
Adjust the node settings according to your requirements:
- Set the
mode
to "sequential" or "repetitive" based on your video processing needs. - Adjust the
face_detect_batch
size if needed. - Select the desired
facedetection
model. - Enable or disable
face_restore
to apply face enhancement. - Adjust the
codeformer_fidelity
value to control the strength of face enhancement. - Select the desired
facerestore_model
for face restoration.
- Set the
-
Execute the ComfyUI workflow to generate the lip-synced output video.
If you encounter an error like "No module named 'torchvision.transforms.functional_tensor" when trying to use the wav2lip node, you'll need to manually update a file in your Python virtual environment (venv) to ensure compatibility with the latest version of torchvision.
To fix this issue, follow these steps:
-
Download the updated "degradations.py" file provided by the maintainer of the wav2lip node. You can find this file in the 'service' folder.
-
Locate the existing "degradations.py" file in your venv directory. The path should be similar to:
path/to/your/venv/lib/site-packages/basicsr/data/degradations.py
-
Create a backup of the existing "degradations.py" file, just in case you need to revert the changes later. You can rename the file to "degradations.py.backup".
-
Replace the existing "degradations.py" file with the updated file you downloaded in step 1.
-
Start or restart ComfyUI.
By replacing the "degradations.py" file with the updated version, you should be able to use the wav2lip node without encountering the "ModuleNotFoundError" related to torchvision.
Note: If you have multiple Python environments or versions installed, make sure to replace the "degradations.py" file in the correct venv directory that is being used by your application.
If you continue to face issues after making this change, please ensure that you have a compatible version of torchvision installed in your environment and that there are no other conflicting dependencies.
Thanks to ArtemM, Wav2Lip, PIRenderer, GFP-GAN, GPEN, ganimation_replicate, STIT for sharing their code.
- StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)
- CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)
- SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023)
- DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)
- 3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)