Intra-lingual Any-to-Any Voice Conversion based on S3PRL; S3PRL-VC.
This repository is PyTorch Lightning based reimplementation of official S3PRL-VC.
The VCC2020 Task1; Intra-lingual any-to-any voice conversion.
Trained on VCTK, evaluated on VCC2020.
- model:
- wave2mel: any S3PRL upstreams
- unit2mel: Taco2-AR
- speaker: Resemblyzer (d-vector)
- mel2wave: HiFi-GAN, kan-bayashi's implementation
- Waveforms for melspec are resampled with
fbank_config["fs"]
(original:sr=None
)- STFT parameters depends on sampling rate, so raw waveform should have intended sr
@article{huang2021s3prl,
title={S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations},
author={Huang, Wen-Chin and Yang, Shu-Wen and Hayashi, Tomoki and Lee, Hung-Yi and Watanabe, Shinji and Toda, Tomoki},
journal={arXiv preprint arXiv:2110.06280},
year={2021}
}
- s3prl/a2a-vc-vctk: Model and hyperparams are totally based on this official repository.