

Primary LanguageJupyter Notebook

Homework 2 (TTS)

TTS homework repository of HSE DLA course. The goal of the project is to implement FastSpeech 2 model and train it on LJ-Speech dataset.


To install necessary python packages run the command:

pip install -r requirements.txt


Download all needed resources (data, checkpoints & inference examples) with

python3 bin/download.py

If you use Yandex DataSphere, specify the config

python3 bin/download.py -c datasphere

One may use datasphere.ipynb notebook that contains all necessary commands to reproduce the results of the project.

Generate pitch profiles required for training with

python3 bin/preprocess_pitch.py [-c datasphere]


Once the resources are ready, start the training with

python3 train.py


python3 train.py -c datasphere

for DataSphere g1.1 configuration.

Generating test audio


python3 inference.py final_model [-c datasphere]

to generate audio files with the following configurations:

  • usual generated audio
  • audio with +20%/-20% for pitch/speed/energy
  • audio with +20/-20% for pitch, speed and energy together

The audio is stored at resources/final_model.

General usage

Run python3 some_script.py -h for help.