English| 中文简体 |
本项目对 GPT-SoVITS 、FishSpeech、ChatTTS进行精简,允许用户使用python代码进行简单地模型推理、训练
-
创建虚拟环境
conda create -n gpt_sovits python=3.8 conda activate gpt_sovits
-
安装torch
pip install torch torchvision torchaudio
-
安装ffmpeg
conda install ffmpeg
-
拉取项目并安装依赖
git clone https://github.com/HanxSmile/Simplify-GPT-SoVITS.git cd Simplify-GPT-SoVITS pip install .
-
验证是否安装成功
python -c "from gpt_sovits import Factory"
-
下载预训练模型(可以参考原作者项目 gpt-sovits)
git lfs clone https://huggingface.co/lj1995/GPT-SoVITS
-
下载中文g2p模型并解压
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip unzip G2PWModel_1.1.zip -d ./
-
修改模型配置,将上面下载的模型的路径填写到模型配置的相应位置
config/gpt_sovits.yaml:
model_cls: gpt_sovits hubert_model_name: GPT-SoVITS/chinese-hubert-base bert_model_name: GPT-SoVITS/chinese-roberta-wwm-ext-large t2s_model_name: GPT-SoVITS/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch=12-step=369668.ckpt vits_model_name: GPT-SoVITS/gsv-v2final-pretrained/s2G2333k.pth cut_method: cut6 text_converter: converter_cls: chinese_converter g2p_model_dir: G2PWModel_1.1 g2p_tokenizer_dir: GPT-SoVITS/chinese-roberta-wwm-ext-large generate_cfg: placeholder: Null
必须修改的字段:
字段 解释 hubert_model_name
hubert模型的路径 bert_model_name
bert模型的路径 t2s_model_name
AR模型的路径 vits_model_name
vits模型的路径 text_converter.g2p_model_dir
g2p模型的路径 text_converter.g2p_tokenizer_dir
g2p tokenizer 的目录(和bert_model_name一致) 可以修改的字段:
字段 解释 cut_method
切分长句的方式(建议使用cut6,即按「,。?!...」切分) -
收集参考音频文件与相应的文本内容
-
模型few-shot推理
from gpt_sovits import Factory from gpt_sovits.utils import save_audio import os import uuid cfg = Factory.read_config("config/gpt_sovits.yaml") model = Factory.build_model(cfg) inputs = { "prompt_audio": "examples/linghua_90.wav", "prompt_text": "藏明刀的刀工,也被算作是本領通神的神士相關人員,歸屬統籌文化、藝術、祭祀的射鳳形意派管理。", "text": "明月几时有,把酒问青天" } model = model.cuda() sr, audio_data = model.generate(inputs) name = uuid.uuid4().hex output_dir = os.getcwd() output_file = os.path.join(output_dir, name + '.wav') output_file = save_audio(audio_data, sr, output_file) print(output_file)
-
下载预训练模型(可以参考原作者项目FishSpeech)
git lfs clone https://huggingface.co/fishaudio/fish-speech-1.4
-
修改模型配置,将上面下载的模型的路径填写到模型配置的相应位置
config/fishspeech.yaml:
model_cls: fish_speech cut_method: cut6 vqgan: model_cls: filefly_vqgan ckpt: fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth spec_transform: sample_rate: 44100 n_mels: 160 n_fft: 2048 hop_length: 512 win_length: 2048 backbone: input_channels: 160 depths: [ 3, 3, 9, 3 ] dims: [ 128, 256, 384, 512 ] drop_path_rate: 0.2 kernel_size: 7 head: hop_length: 512 upsample_rates: [ 8, 8, 2, 2, 2 ] upsample_kernel_sizes: [ 16, 16, 4, 4, 4 ] resblock_kernel_sizes: [ 3, 7, 11 ] resblock_dilation_sizes: [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ] num_mels: 512 upsample_initial_channel: 512 pre_conv_kernel_size: 13 post_conv_kernel_size: 13 quantizer: input_dim: 512 n_groups: 8 n_codebooks: 1 levels: [ 8, 5, 5, 5 ] downsample_factor: [ 2, 2 ] text2semantic: model_cls: dual_ar_transformer tokenizer_name: fish-speech-1.4/ ckpt: fish-speech-1.4/model.pth model: attention_qkv_bias: False codebook_size: 1024 dim: 1024 dropout: 0.1 head_dim: 64 initializer_range: 0.02 intermediate_size: 4096 max_seq_len: 4096 n_fast_layer: 4 n_head: 16 n_layer: 24 n_local_heads: 2 norm_eps: 1e-6 num_codebooks: 8 rope_base: 1e6 tie_word_embeddings: False use_gradient_checkpointing: True vocab_size: 32000 text_converter: converter_cls: chinese_fs_converter
必须修改的字段:
字段 解释 vqgan.ckpt
vqgan模型的路径 text2semantic.ckpt
text2semantic模型的路径 text2semantic.tokenizer_name
text2semantic模型使用的tokenizer的所在目录 可以修改的字段:
字段 解释 cut_method
切分长句的方式(建议使用cut6,即按「,。?!...」切分) -
收集参考音频文件与相应的文本内容
-
模型few-shot推理
from gpt_sovits import Factory from gpt_sovits.utils import save_audio import os import uuid cfg = Factory.read_config("config/fishspeech.yaml") model = Factory.build_model(cfg) inputs = { "prompt_audio": "examples/linghua_90.wav", "prompt_text": "藏明刀的刀工,也被算作是本領通神的神士相關人員,歸屬統籌文化、藝術、祭祀的射鳳形意派管理。", "text": "明月几时有,把酒问青天" } model = model.cuda() sr, audio_data = model.generate(inputs) name = uuid.uuid4().hex output_dir = os.getcwd() output_file = os.path.join(output_dir, name + '.wav') output_file = save_audio(audio_data, sr, output_file) print(output_file)
step 1:下载预训练模型(可参考上文)
step 2:准备配置文件,把预训练模型的路径放在配置文件的对应位置(可参考上文),将所有的配置文件放在项目的config
目录下
step 3:在项目目录下运行:python webui.py
-
模型推理:
- GPT-SoVITS
- FishSpeech
- Chat-TTS
-
模型训练