๐Ÿ”Š Voice2Face-modeling

Project Structure

code
โ”ฃ LDM
โ”ƒ โ”ƒ README.md
โ”ƒ โ”— ...
โ”ฃ SimSwap
โ”ƒ โ”ƒ README.md
โ”ƒ โ”— ...
โ”ฃ pytorch_template
โ”ƒ โ”— ...
โ”ฃ sf2f
โ”ƒ โ”ƒ README.md
โ”ƒ โ”— ...
โ”— wcgan-gp
  โ”ƒ README.md
  โ”— ...
โ”— README.md
โ”— requirements.txt
โ”— train.sh
โ”— voxceleb_download.sh
...
  • ๊ฐ ํด๋” ๋‚ด๋ถ€์˜ README ํŒŒ์ผ์— ์ถ”๊ฐ€ ์„ค๋ช…์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • LDM: README.md
  • SimSwap: README.md
  • sf2f: README.md
  • wcgan-gp: README.md

Usage

LDM

  • Latent Diffusion: paper | github
  • Low-Rank Adaptation: paper | github
  • ๊ธฐ์กด Speech Fusion to Face ๋ชจ๋ธ์˜ voice encoder๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€์˜ ํ’ˆ์งˆ ํ–ฅ์ƒ์„ ์œ„ํ•ด Latent Diffusion model์„ ๊ตฌํ˜„ํ•œ ํด๋”์ž…๋‹ˆ๋‹ค.
  • ๋˜ํ•œ, Diffusion model์˜ ์›ํ™œํ•œ ํ•™์Šต์„ ์œ„ํ•ด LoRA ๊ตฌ์กฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต ์‹œ๊ฐ„ ๋ฐ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

SimSwap

  • SimSwap: paper | github
  • ์Œ์„ฑ ๋ฐ์ดํ„ฐ๋กœ ๋ถ€ํ„ฐ ์ƒ์„ฑ๋œ ์–ผ๊ตด์„ ๊ธฐ์กด ์˜์ƒ์— ํ•ฉ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ์ƒ์„ฑ๋œ ์ •๋ฉด ์–ผ๊ตด์„ ์˜์ƒ ์† ๋‹ค์–‘ํ•œ ๊ฐ๋„์— ๋งž๊ฒŒ ํ•ฉ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ํ•ฉ์„ฑ ์†๋„๋ณด๋‹ค ์ •ํ™•๋„์™€ ํ’ˆ์งˆ์ด ๋ณด๋‹ค ๋†’์€ ๋ชจ๋ธ์„ ์„ ํƒํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ํ•ฉ์„ฑ์ด ์™„๋ฃŒ๋œ ์˜์ƒ์„ gif ํ˜น์€ mp4 ํ˜•ํƒœ๋กœ ์ƒ์„ฑํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

pytorch_template

  • pytorch template: ์ฐธ๊ณ github
  • ๋ชจ๋ธ ๊ฐœ๋ฐœ์˜ ํšจ์œจ์„ฑ๊ณผ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ ํ˜•์‹์ž…๋‹ˆ๋‹ค.
  • ๊ฐœ๋ฐœํ•œ ๋ชจ๋ธ์„ ํŒ€์›๋“ค์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฝ๋„๋ก ์ •๋ฆฌํ•˜์—ฌ ๊ณต์œ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

sf2f

  • Speech Fusion to Face: paper | github | page
  • ์Œ์„ฑ ๋ฐ์ดํ„ฐ (.wav) ํŒŒ์ผ์„ mel_spectrogram์œผ๋กœ ๋ณ€ํ™˜ํ•œ ํ›„, ์ด๋ฅผ ํ†ตํ•ด ์–ผ๊ตด์„ ์žฌ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • scripts/convert_wav_to_mel.py: ์Œ์„ฑ ๋ฐ์ดํ„ฐ(.wav) ํŒŒ์ผ์„ ์ผ์ •ํ•œ ํฌ๊ธฐ(100x150)์˜ mel_spectrogram์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ „์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ  ์ด๋ฅผ pickle ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ์ž…๋‹ˆ๋‹ค.
  • options/data_opts : ๋ฐ์ดํ„ฐ ์…‹์„ ์ƒ์„ฑํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ์ง€์ •ํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋“ค์„ ์ €์žฅํ•ด๋‘” ํด๋”๋กœ, vox celeb dataset๊ณผ olkavs dataset์— ๋Œ€ํ•œ ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • options/sf2f: train๊ณผ inference ์‹œ์— ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋“  ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ์ง€์ •ํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ๋“ค์„ ์ €์žฅํ•ด๋‘” ํด๋”๋กœ, sf2f with vox์™€ sf2f with olkavs๋กœ ๋‚˜๋‰˜์–ด์ ธ ์žˆ๊ณ , sf2f๋Š” ๋ชจ๋ธ์˜ ๋ฐฉ์‹๊ณผ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • utils/compute_metrics.py: ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” metrics๋ฅผ ์„ ์–ธํ•˜๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ์„ ํ†ตํ•ด ๋ชจ๋ธ ํ•™์Šต์˜ ํ‰๊ฐ€ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์Šคํฌ๋ฆฝํŠธ์ž…๋‹ˆ๋‹ค.
  • connect_mlflow.py: mlflow๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ ํ•™์Šต์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ , ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ๋ชจ๋ธ์˜ weights๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด mlflow ์„œ๋ฒ„์™€ ์—ฐ๊ฒฐํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ์ž…๋‹ˆ๋‹ค.

wcgan-gp

  • Wasserstein GAN: paper | github
  • Wasserstein GAN with Gradient Penalty: paper | github
  • Conditional GAN: paper | github
  • ๋ณธ ํ”„๋กœ์ ํŠธ์˜ ๋ชฉ์†Œ๋ฆฌ๋ฅผ ํ†ตํ•œ ์–ผ๊ตด ์ƒ์„ฑ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€์˜ ์‚ฌ์šฉ์ž ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ๊ตฌํ˜„๋œ ๋น„๊ต๊ตฐ(๋ชฉ์†Œ๋ฆฌx, ๋‚˜์ด/์„ฑ๋ณ„o) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • Wasserstein GAN์˜ ํ•™์Šต ์•ˆ์ •์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด gradient penalty๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๊ณ , ๋ชจ๋ธ ๊ฒฐ๊ณผ๋ฅผ ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•œ condition์„ ์ถ”๊ฐ€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ์ž์ฒด ์„ฑ๋Šฅ์ด ๋น„๊ต ๋ถˆ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์œผ๋กœ ํ•™์Šต๋˜์–ด, celebA dataset์„ ํ†ตํ•ด ์‚ฌ์ „ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๊ณ  ์ดํ›„ vox celeb dataset์— finetuning์„ ์ง„ํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Getting Started

Setting up Virtual Environment

  1. Initialize and update the server

su -

source .bashrc

  1. Create and Activate a virtual environment in the project directory

conda create -n env python=3.8

conda activate env

  1. To deactivate and exit the virtual environment, simply run:

deactivate

Install Requirements

To Install the necessary packages listed in requirements.txt, run the following command while your virtual environment is activated:


pip install -r requirements.txt

Links