Arbitrary Modification of Speech Characteristics in Segmental Duration

Benjamin Harrison (bharrison49@gatech.edu)
Sidong Guo (sguo93@gatech.edu)

Description

This is an implementation of an algorithm that allows users to select arbitrary non-overlapping regions of duration of any spoken content, and speed up or down each audio region by a corresponding scaling factor of users' choosing, without altering other speech characteristics such as pitch, amplitude, etc.

This implemention is based on ScalerGAN and Hi-Fi GAN

Steps

Follow instruction in ScalerGAN, exceptionally, download LJ Speech Dataset and place under scaler_gan/data/wavs
Follow instruction in Hi-Fi GAN, exceptionally, the default generator model is generator t2_v2, --checkpointfile argument can be changed in TermProject.py
The Directory hierarchy should be:
--scaler_gan
----hifi_gan
------generated_files_from_mel
------test_mel_files
Place the spoken content/audio files you want to arbitrarily time scale under directory scaler_gan/data/Project.
Under scaler_gan (main) directory, run python TermProject.py
A UI will appear that allows users to select audio files to time scale and choose arbitrary segments with "commit changes"
Upon all audio segments are chosen with corresponding scaling factor, click "Complete Edits", a new wav file named "New_"+Originalwavfilename will be created that has its audio regions time scaled accordingly.

SiDongG/Arbitrary-Modification-of-Speech-Segmental-Duration

Arbitrary Modification of Speech Characteristics in Segmental Duration

Description

Steps