/VI-SVS

Singing Voice Synthesis based on VITS, different from VISinger

Primary LanguagePythonApache License 2.0Apache-2.0

Variational Inference with adversarial learning for end-to-end Singing Voice Synthesis

Different from VISinger, It is just VITS without MAS and DurationPredictor.

作为一个用于学习的项目,就这样了:Pitch的预测是需要改进的地方

VISinger

VI-SVS

Pitch and Duration will be developed as add-on!

训练步骤

  • 1 下载数据 segments.zip,并解压
segments
|-- test.txt
|-- train.txt
|-- transcriptions.txt
`-- wavs
    |-- 2001000001.wav
    |-- 2001000002.wav
    |-- 2001000003.wav
  • 2 转换采样率: 本项目采用32KHz
python util/resample.py -w segments/wavs/ -o data_svs/wavs -s 32000
  • 3 生成数据标注
python util/generate_label.py --config configs/singing_base.yaml --data data_svs/ --file segments/transcriptions.txt

data_svs/labels.txt,内容格式:wave path|label path|score path|pitch path|slurs path

  • 3 划分训练索引
python util/generate_label.py --file data_svs/labels.txt

生成 filelists/singing_train.txt 和 filelists/singing_valid.txt

  • 4 启动训练
python svs_train.py -c configs/singing_base.yaml -n vits_svs
  • 5 训练Pitch
python pit_train.py -c configs/singing_base.yaml -n pitch

推理验证

  • 0 模型导出
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
  • 1 推理验证: F0根据乐谱生成
python svs_infer.py --config configs/singing_base.yaml --model svs_opencpop.pt
python svs_song.py --config configs/singing_base.yaml --model svs_opencpop.pt

推理验证,使用Pitch预测,效果不佳

  • 0 模型导出
python svs_export.py --config configs/singing_base.yaml --model chkpt/vits_svs/vits_svs_****.pt
python pit_export.py --config configs/singing_base.yaml --model chkpt/pitch/pitch_****.pt
  • 1 推理验证
python svs_infer_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt
python svs_song_pitch.py --config configs/singing_base.yaml --model svs_opencpop.pt --pitch pit_opencpop.pt

数据

https://wenet.org.cn/opencpop/

歌声合成参考

https://github.com/SJTMusicTeam/Muskits

https://github.com/MoonInTheRiver/DiffSinger

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

模型设计参考

https://github.com/NVIDIA/BigVGAN

https://github.com/jaywalnut310/vits

https://github.com/mindslab-ai/univnet

https://github.com/PlayVoice/so-vits-svc-5.0

https://github.com/shivammehta25/Matcha-TTS

RoFormer: Enhanced Transformer with rotary position embedding

Diffusion Pitch

https://github.com/thuhcsi/DiffVar

https://github.com/hayeong0/Diff-HierVC

https://github.com/tonnetonne814/SiFi-VITS2-44100-Ja

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

Diffusion Pitch of Diff-HierVC

DiffPitch