- Paper Title: Toward Degradation-Robust Voice Conversion
- Authors: Chien-yu Huang, Kai-Wei Chang, Hung-yi Lee
- Paper Link: https://arxiv.org/abs/2110.07537
To appear in the proceedings of ICASSP 2022, equal contribution from first two authors
Both Speech Enhancement Concatenation and End-to-End Denoising Training can effectively imporve state-of-the-art VC models' degradation robustness and adversarial robustness.
- Pros: Any-off-the-shelf model applies.
- Cons: More computations are required for inference.
- Pros: Combine Voice conversion and speech enhancement in a single model.
- Cons: Need more resouces for training.
- AdaIN-VC
- AdaIN-VC/: training AdaIN-VC
- AdaIN-VC-robust/: training AdaIN-VC with data augmentation
- Reference: https://github.com/cyhuang-tw/AdaIN-VC
- S2VC
- S2VC/: training S2VC
- S2VC-robust/: training S2VC with data augmentation
- Reference: https://github.com/howard1337/S2VC
- assets
- scripts/: scripts for creating dataset, adding noises, and speech enhancement. For more information, please refer to example usage.
- VCTK_split.py: train / valid / test dataset split used in the paper.
- Voice-conversion-evaluation
- objective evaluation, including calculating CER for naturalness evaluation, and SVAR for speaker similarity evaluation
- reference: https://github.com/tzuhsien/Voice-conversion-evaluation
https://cyhuang-tw.github.io/robust-vc-demo/
@inproceedings{huang2022toward,
title={Toward Degradation-Robust Voice Conversion},
author={Huang, Chien-yu and Chang, Kai-Wei and Lee, Hung-yi},
booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={6777--6781},
year={2022},
organization={IEEE}
}