S3PRL-VC : Intra-lingual A2A VC

Intra-lingual Any-to-Any Voice Conversion based on S3PRL; S3PRL-VC.
This repository is PyTorch Lightning based reimplementation of official S3PRL-VC.

Task

The VCC2020 Task1; Intra-lingual any-to-any voice conversion.
Trained on VCTK, evaluated on VCC2020.

Implementation

model:
- wave2mel: any S3PRL upstreams
- unit2mel: Taco2-AR
  - speaker: Resemblyzer (d-vector)
- mel2wave: HiFi-GAN, kan-bayashi's implementation

Quick Training

How to Use

Change from original s3prl-vc

Waveforms for melspec are resampled with fbank_config["fs"] (original: sr=None)
- STFT parameters depends on sampling rate, so raw waveform should have intended sr

References

Original paper

@article{huang2021s3prl,
  title={S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations},
  author={Huang, Wen-Chin and Yang, Shu-Wen and Hayashi, Tomoki and Lee, Hung-Yi and Watanabe, Shinji and Toda, Tomoki},
  journal={arXiv preprint arXiv:2110.06280},
  year={2021}
}

Acknowlegements

s3prl/a2a-vc-vctk: Model and hyperparams are totally based on this official repository.

tarepan/S3PRL_VC