The author does not understand anything about machine learning and this text may contain many errors. If the code is publicly available, the Github link shall be attached. I am sure there are many more great repositories not listed here. Sorry I didn't have time.
Replacing VITS
' TextEncoder
with HuBERT
's ContentEncoder
eliminates the need for inputting phoneme sequences (i.e., eliminate language dependence).
HuBERT
is part of SoftVC
.
- innnky/so-vits-svc: 基于vits与softvc的歌声音色转换模型
- quickvc/QuickVC-VoiceConversion: QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
- CjangCjengh/MoeGoe: Executable file for VITS inference (SoftVC/W2V2)
- PlayVoice/VI-SVC: vits singing voice conversion based on ppg & hubert;singing voice clone;
- Francis-Komizu/Sovits: An implementation of the combination of Soft-VC and VITS Deprecated
- vtuber-plan/vcvits: Non Parallel Voice Conversion based on VITS
Performance is improved by improving the decoder, which was the bottleneck, with multiband generation and inverse short-time Fourier transform.
- quickvc/QuickVC-VoiceConversion: QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
- MasayaKawamura/MB-iSTFT-VITS: Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
- hcy71o/MB-iSTFT-VITS-with-AutoVocoder: Incorporating AutoVocoder to MB-iSTFT-VITS
- [2206.00208] AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
- innnky/vispeech: 基于vits fastspeech2 visinger的tts模型
- CODEJIN/VITS_Diffusion
- hcy71o/SC-VITS: VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.
- innnky/emotional-vits: 无需情感标注的情感可控语音合成模型,基于VITS
- OlaWod/FreeVC: FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
- Edresson/YourTTS: YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone Zero-shot voice conversion which developed earlier.
- Francis-Komizu/VITS: ACG Text-to-Speech
- Francis-Komizu/VITS-Bilingual: Chinese-Japanese Bilingual Text-to-Speech
- hcy71o/SC-VITS: VITS-based zero-shot TTS system varying with diverse style/speaker conditioning methods.
- rotten-work/vits-mandarin-windows: VITS for Mandarin. Support Windows and Linux, low-end and high-end hardwares
- AlexandaJerry/vits-mandarin-biaobei: application of vits on mandarin tts
- CjangCjengh/vits: VITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
- isletennos/MMVC_Trainer: AIを使ったリアルタイムボイスチェンジャー(Trainer)
- [2211.09365] Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation
- Period VITS
Because refactoring takes time, the latest technologies are not always adopted in theses repositories. However, these should be made easier to use.
- coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
- espnet/espnet: End-to-End Speech Processing Toolkit
- CjangCjengh/MoeGoe_GUI: GUI for MoeGoe
- Francis-Komizu/StellaVoiceChanger: Deep-learning-based voice changer, supporting local inference.
- luoyily/MoeTTS: Speech synthesis model /inference GUI repo for galgame characters based on Tacotron2, Hifigan, VITS and Diff-svc
- TheKOG/Gal-Voice-Bot
- VoiceConversionLab (@VoiceConversion) / Twitter
- zzw922cn/awesome-speech-recognition-speech-synthesis-papers: Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
- Search | arXiv e-print repository
- Search | arXiv e-print repository
- "VITS" - Google Search
- Search · vits
- 【機械学習】VITSでアニメ声へ変換できるボイスチェンジャー&読み上げ器を作った話 - Qiita
- 2021年6月に発表された最新の音声合成手法「VITS」でアニメ風合成音声を作ってみた【つくよみちゃんコーパス】