DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions



Requirements

You can install required environments using below commands:

git clone https://github.com/deepbrainai-research/discohead
cd discohead
conda create -n discohead python=3.7
conda activate discohead
conda install pytorch==1.10.0 torchvision==0.11.1 torchaudio==0.10.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Generating Demo Videos

  • Download the pre-trained checkpoints from google drive and put into weight folder.
  • Download dataset.zip from google drive and unzip into dataset.
  • DisCoHead directory should have the following structure.
DisCoHead/
├── dataset/
│   ├── grid/
│   │   ├── demo1/
│   │   ├── demo2/
│   ├── koeba/
│   │   ├── demo1/
│   │   ├── demo2/
│   ├── obama/
│   │   ├── demo1/
│   │   ├── demo2/
├── weight/
│   ├── grid.pt
│   ├── koeba.pt
│   ├── obama.pt
├── modules/
‥‥

  • The --mode argument is used to specify which demo video you want to generate:
python test.py --mode {mode}
  • Available modes: obama_demo1, obama_demo2, grid_demo1, grid_demo2, koeba_demo1, koeba_demo2

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. You must not use this work for commercial purposes. You must not distribute it in modified material. You must give appropriate credit and provide a link to the license.

Citation

@INPROCEEDINGS{10095670,
  author={Hwang, Geumbyeol and Hong, Sunwon and Lee, Seunghyun and Park, Sungwoo and Chae, Gyeongsu},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={DisCoHead: Audio-and-Video-Driven Talking Head Generation by Disentangled Control of Head Pose and Facial Expressions}, 
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10095670}}