/more-or-let

Feature experimentation on TIMIT with CNNs and CTC

Primary LanguagePythonApache License 2.0Apache-2.0

more-or-let

Feature experiments with an end-to-end convolutional phone recognizer

What is it?

A somewhat over-engineered Kaldi recipe for end-to-end phoneme recognition on TIMIT. The model is based on that of Zhang et al. 1, with a few minor modifications. The recipe tests a variety of time-frequency features in using Kaldi's built-in features as well as some provided by pydrobert-speech 2.

Getting started

  1. Clone this repository into kaldi/egs/timit/.
  2. Softlink the steps and utils folders from ../../wsj/s5 into this directory.
  3. Take a look at run.sh and cmd.sh, modifying them as you see fit.
  4. Call ./run.sh Run steps 1-5 involve setting up the python environment, including things like installing tensorflow from source. I tend to just do this manually and skip to step 6.

License

This code falls under Apache License 2.0 (see LICENSE). run.sh, as well as the files in local are either inspired, copied, or heavily modified from Kaldi 3. Kaldi is licensed under Apache License 2.0. Its notice can be found in COPYING_kaldi.

How to cite

You're welcome to use this repository for publications. Please cite the original paper (Zhang et al 1) first and foremost (though note the few modifications). If you're feeling generous, please cite our paper as well (Robertson et al 4).

References