Unsupervised Any-to-many Audiovisual Synthesis via Exemplar Autoencoders
Kangle Deng, Aayush Bansal, Deva Ramanan
project page / demo / arXiv
This repo provides a PyTorch Implementation of our work.
Acknowledgements: This code borrows heavily from Auto-VC and Tacotron.
First, make sure ffmpeg installed on your machine.
Then, run: pip install -r requirements.txt
We provide our CelebAudio Dataset at link.
Check 'scripts/train_audio.sh' for an example of training a Voice-Conversion model. Make sure directory 'logs' exist.
Generally, run:
python train_audio.py --data_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path_A PATH_TO_TEST_AUDIO --test_path_B PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL
Check 'scripts/train_audiovisual.sh' for an example of training a Audiovisual-Synthesis model. We usually train an audiovisual model based on a pretrained audio model.
Generally, run:
python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --use_256 --load_model LOAD_MODEL_PATH
If you want the video resolution to be 512 * 512, use the StackGAN-style 2-stage generation.
Generally, run:
python train_audiovisual.py --video_path PATH_TO_TRAINING_DATA --experiment_name EXPERIMENT_NAME --save_freq SAVE_FREQ --test_path PATH_TO_TEST_AUDIO --batch_size BATCH_SIZE --save_dir PATH_TO_SAVE_MODEL --residual --load_model LOAD_MODEL_PATH
Check 'scripts/test_audio.sh' for an example of testing a Voice-Conversion model.
To convert a wavfile using a trained model, run:
python test_audio.py --model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT
Check 'scripts/test_audiovisual.sh' for an example of testing a Audiovisual-Synthesis model.
python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --use_256
python test_audiovisual.py --load_model PATH_TO_MODEL --wav_path PATH_TO_INPUT --output_file PATH_TO_OUTPUT --residual