/HiFiGAN

HiFiGAN Implementation

Primary LanguagePythonMIT LicenseMIT

Neural Vocoder

AboutInstallationHow To UseFinal resultsCreditsLicense

About

This repository contains the implementation of HiFiGAN vocoder.

See the task assignment here.

See WandB report with implementation details and audio analysis.

Installation

Follow these steps to install the project:

  1. (Optional) Create and activate new environment using conda.

    # create env
    conda create -n nv python=3.11
    
    # activate env
    conda activate nv
  2. Install all required packages.

    pip install -r requirements.txt
  3. Download model checkpoint.

    python download_weights.py

How To Use

Inference

  1. If you only want to synthesize one text/phrase and save it, run the following command:

    python synthesize.py 'text="YOUR_TEXT"' save_path=SAVE_PATH

    where SAVE_PATH is a path to save synthesize audio. Please be careful in quotes.

  2. If you want to synthesize audio from text files, your directory with text should has the following format:

    NameOfTheDirectoryWithUtterances
    └── transcriptions
         ├── UtteranceID1.txt
         ├── UtteranceID2.txt
         .
         .
         .
         └── UtteranceIDn.txt
    

    Run the following command:

    python synthesize.py dir_path=DIR_PATH save_path=SAVE_PATH

    where DIR_PATH is directory with text and SAVE_PATH is a path to save synthesize audio.

Training

To reproduce this model, run the following command:

python train.py

It takes around 3 days to train model from scratch on A100 GPU.

Final results

  • Mihajlo Pupin was a founding member of National Advisory Committee for Aeronautics (NACA) on 3 March 1915, which later became NASA, and he participated in the founding of American Mathematical Society and American Physical Society.

    id4.mp4
  • Leonard Bernstein was an American conductor, composer, pianist, music educator, author, and humanitarian. Considered to be one of the most important conductors of his time, he was the first American-born conductor to receive international acclaim.

    id5.mp4
  • Lev Termen, better known as Leon Theremin was a Russian inventor, most famous for his invention of the theremin, one of the first electronic musical instruments and the first to be mass-produced.

    id3.mp4
  • Deep Learning in Audio course at HSE University offers an exciting and challenging exploration of cutting-edge techniques in audio processing, from speech recognition to music analysis. With complex homeworks that push students to apply theory to real-world problems, it provides a hands-on, rigorous learning experience that is both demanding and rewarding.

    id1.mp4
  • Dmitri Shostakovich was a Soviet-era Russian composer and pianist who became internationally known after the premiere of his First Symphony in 1926 and thereafter was regarded as a major composer.

    id2.mp4

WV-MOS=3.47 using text and WV-MOS=2.15 using MelSpecs.

Credits

This repository is based on a PyTorch Project Template.

License

License