more-or-let
Feature experiments with an end-to-end convolutional phone recognizer
What is it?
A somewhat over-engineered Kaldi recipe for end-to-end phoneme recognition on TIMIT. The model is based on that of Zhang et al. 1, with a few minor modifications. The recipe tests a variety of time-frequency features in using Kaldi's built-in features as well as some provided by pydrobert-speech 2.
Getting started
- Clone this repository into kaldi/egs/timit/.
- Softlink the steps and utils folders from ../../wsj/s5 into this directory.
- Take a look at run.sh and cmd.sh, modifying them as you see fit.
- Call
./run.sh
Run steps 1-5 involve setting up the python environment, including things like installing tensorflow from source. I tend to just do this manually and skip to step 6.
License
This code falls under Apache License 2.0 (see LICENSE). run.sh, as well as the files in local are either inspired, copied, or heavily modified from Kaldi 3. Kaldi is licensed under Apache License 2.0. Its notice can be found in COPYING_kaldi.
How to cite
You're welcome to use this repository for publications. Please cite the original paper (Zhang et al 1) first and foremost (though note the few modifications). If you're feeling generous, please cite our paper as well (Robertson et al 4).