Neural Vocoder

About • Installation • How To Use • Final results • Credits • License

About

This repository contains the implementation of HiFiGAN vocoder.

See the task assignment here.

See WandB report with implementation details and audio analysis.

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda.

# create env
conda create -n nv python=3.11

# activate env
conda activate nv

Install all required packages.
```
pip install -r requirements.txt
```
Download model checkpoint.
```
python download_weights.py
```

How To Use

Inference

If you only want to synthesize one text/phrase and save it, run the following command:
```
python synthesize.py 'text="YOUR_TEXT"' save_path=SAVE_PATH
```
where SAVE_PATH is a path to save synthesize audio. Please be careful in quotes.

If you want to synthesize audio from text files, your directory with text should has the following format:

NameOfTheDirectoryWithUtterances
└── transcriptions
     ├── UtteranceID1.txt
     ├── UtteranceID2.txt
     .
     .
     .
     └── UtteranceIDn.txt

Run the following command:

python synthesize.py dir_path=DIR_PATH save_path=SAVE_PATH

where DIR_PATH is directory with text and SAVE_PATH is a path to save synthesize audio.

Training

To reproduce this model, run the following command:

python train.py

It takes around 3 days to train model from scratch on A100 GPU.

Final results

Mihajlo Pupin was a founding member of National Advisory Committee for Aeronautics (NACA) on 3 March 1915, which later became NASA, and he participated in the founding of American Mathematical Society and American Physical Society.

id4.mp4
Leonard Bernstein was an American conductor, composer, pianist, music educator, author, and humanitarian. Considered to be one of the most important conductors of his time, he was the first American-born conductor to receive international acclaim.

id5.mp4
Lev Termen, better known as Leon Theremin was a Russian inventor, most famous for his invention of the theremin, one of the first electronic musical instruments and the first to be mass-produced.

id3.mp4
Deep Learning in Audio course at HSE University offers an exciting and challenging exploration of cutting-edge techniques in audio processing, from speech recognition to music analysis. With complex homeworks that push students to apply theory to real-world problems, it provides a hands-on, rigorous learning experience that is both demanding and rewarding.

id1.mp4
Dmitri Shostakovich was a Soviet-era Russian composer and pianist who became internationally known after the premiere of his First Symphony in 1926 and thereafter was regarded as a major composer.

id2.mp4

WV-MOS=3.47 using text and WV-MOS=2.15 using MelSpecs.

Credits

This repository is based on a PyTorch Project Template.

free001style/HiFiGAN