Sign2Speech Work In Progress
For now, we have a very simple dataset of alphabet sounds in WAV that you can download here as a tarball or a zip file.
To generate the Mel spectrogram images, set the current folder to the root of the project, download the archive and execute:
# Step 1: extract the WAV archive
tar xzf alphabet.tgz # Or if you're on windows, unzip the ZIP in the current folder
# Step 2: Generate the spectograms
./scripts/gen_alphabet_spectrograms.py data/wav/ data/spec/
# Or something like this on windows, I guess: python scripts/gen_alphabet_spectrograms.py data/wav/ data/spec/
Alternatively, you can download this archive, but I don't promise to keep it up to date!
You only need to download the latest dataset by Frank and place it in data/train_poses/
.
Everything should be configured to work with the AutoEncoder and the Sign dataset, but you may
override the config via the command line, or add another config (see the
hydra documentation).
To train, simply run the train.py
script. To test, run the test.py
script. Simple as that.