/GPT-SoVITS-Inference

Inference Specialization

Primary LanguagePythonMIT LicenseMIT

GSVI : GPT-SoVITS Inference Plugin

Welcome to GSVI, an inference-specialized plugin built on top of GPT-SoVITS to enhance your text-to-speech (TTS) experience with a user-friendly API interface. This plugin enriches the original GPT-SoVITS project, making voice synthesis more accessible and versatile.

Please note that we do not recommend using GSVI for training. Its existence is to make the process of using GPT-soVITS simpler and more comfortable for others, and to make model sharing easier.

This fork is mainly based on the fast_inference_ branch, using a lot of PR code contributed by ChasonJiang. Thanks to this great developer. ”Dalao NB!“

At the same time, the Inference folder used by this branch is the main submodule, coming from https://github.com/X-T-E-R/TTS-for-GPT-soVITS.

Features

  • High-level abstract interface for easy character and emotion selection
  • Comprehensive TTS engine support (speaker selection, speed adjustment, volume control)
  • User-friendly design for everyone
  • Simply place the shared character model folder, and you can quickly use it.
  • High compatibility and extensibility for various platforms and applications (for example: SillyTavern)

Getting Started

  1. Install manually or use prezip for Windows
  2. Put your character model folders
  3. Run bat file or run python file manually
  4. If you encounter issues, join our community or consult the FAQ. QQ Group: 863760614 , Discord (AI Hub):

We look forward to seeing how you use GSVI to bring your creative projects to life!

Prezip : https://huggingface.co/XTer123/GSVI_prezip/tree/main

Usage

Use With Bat Files

You could see a bunch of bat files in 0 Bat Files/

  • If you want to update, then run bat 0 and 1 (or 999 0 1)
  • If you want to start with a single gradio file, then run bat 3
  • If you want to start with backend and frontend , run bat 5 and 6
  • If you want to manage your models, run 10.bat

Python Files

Start with a single gradio file

  • Gradio Application: app.py

Model Management

  • Gradio Model Management Interface: webui/webui.py

API Documentation

For API documentation, visit our Yuque documentation page. or API Doc.md

Model Folder Format

In a character model folder, like trained/Character1/

Put the pth / ckpt / wav files in it, the wav should be named as the prompt text

Like :

trained
--hutao
----hutao-e75.ckpt
----hutao_e60_s3360.pth
----hutao said something.wav

Add a emotion for your model

To make that, open the Model Manage Tool (10.bat /webuis/character_manager/webui.py)

It can assign a reference audio to each emotion, aiming to achieve the implementation of emotion options.

Installation

You could install this with the guide bellow, then download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models, and put your character model folder in trained

Or just download the pre-packaged distribution for Windows. ( then put your character model folder in trained )

About the character model folder, see below

Tested Environments

  • Python 3.9, PyTorch 2.0.1, CUDA 11
  • Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
  • Python 3.9, PyTorch 2.3.0.dev20240122, macOS 14.3 (Apple silicon)

Note: numba==0.56.4 requires py<3.11

Windows

If you are a Windows user (tested with win>=10), you can directly download the pre-packaged distribution and double-click on go-webui.bat to start GPT-SoVITS-WebUI.

Or pip install -r requirements.txt , and then double click the install.bat

Linux

conda create -n GPTSoVits python=3.10
conda activate GPTSoVits
bash install.sh

macOS

Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.

First make sure you have installed FFmpeg by running brew install ffmpeg or conda install ffmpeg, then install by using the following commands:

conda create -n GPTSoVits python=3.10
conda activate GPTSoVits

pip install -r requirements.txt
git submodule init
git submodule update --init --recursive

Install FFmpeg ( No need if use prezip )

Conda Users

conda install ffmpeg

Ubuntu/Debian Users

sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'

Windows Users

Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.

Pretrained Models ( No need if use prezip )

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

Docker

Please prepare local path and models before running the following command.

  • output:The output dirctory of wav files
  • logs: for recording logs
  • SoVITS_weights: SoVITS weights
  • GPT_SoVITS: all pretrained_models are in GPT_SoVITS/pretrained_models which is a big size
  • nltk_data: nltk library, please download it with the following command:
python -m nltk.downloader -d ./nltk_data averaged_perceptron_tagger cmudict
  • trained: trained models(From which you trained or borrowed from others)
docker build -t gpt-sovits-inference:latest -f Dockerfile .
docker run --rm -it -d --gpus="device=0" --env=is_half=False \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/output:/workspace/output \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/logs:/workspace/logs \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/SoVITS_weights:/workspace/SoVITS_weights \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/GPT_SoVITS/:/workspace/GPT_SoVITS \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/nltk_data:/usr/local/nltk_data \
  --volume=<Replace with the path of your project>/GPT-SoVITS-Inference/trained:/workspace/trained \
  --workdir=/workspace -p 5000:5000 --shm-size="16G" gpt-sovits-inference:latest

Remove the pyaudio in the requirements.txt !!!!

Credits

This fork is mainly based on the fast_inference_ branch of GPT-soVITS project, using a lot of PR code contributed by ChasonJiang.

Special thanks to the following projects and contributors:

Theoretical

Pretrained Models

Text Frontend for Inference

WebUI Tools

Thanks to all contributors for their efforts