GANP-TTS

Here is the GitHub repository for the paper: GANP-TTS: A GAN-BASED PRE-GENERATED TTS MODEL WITH MULTI-LOSS FUNCTIONS FOR MORE NATURAL SYNTHESIZED SPEECH.

Audio Samples

Audio samples generated by this implementation can be found here.

Quickstart

Dependencies

You can install the Python dependencies with

pip3 install -r requirements.txt

Training

Datasets

The supported datasets are

[Biaobei](https://www.data-baker.com/open source.html): a Mandarin TTS dataset consisting of approximately 10,000 short audio samples of a female speaker, totaling approximately 12 hours.
AISHELL-3: a Mandarin TTS dataset with 218 male and female speakers, roughly 85 hours in total.

We take AISHELL-3 as an example hereafter.

Preprocessing

First, run

python3  prepare_align.py config/preprocess.yaml

for some preparations.

As described in the paper, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences.

After that, run the preprocessing script by

python3 preprocess.py config/preprocess.yaml

Training

Train your model with

python3 train.py -p config/preprocess.yaml -m config/model.yaml -t config/train.yaml

Inference

Test your model with

python3 synthesize.py --text '大数据、云计算、物联网、人工智能等新一代信息技术的应用，给我们带来便利的同时，也带来了新的网络威胁。' --speaker_id 162 --mode single  -p config/preprocess.yaml -m config/model.yaml -t config/train.yaml

entn-at/ganp-tts

GANP-TTS

Audio Samples

Quickstart

Dependencies

Training

Datasets

Preprocessing

Training

Inference

Citation