The reference implementation for the paper
End-to-End Attention-based Large Vocabulary Speech Recognition. Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio.
(arxiv draft, submitted to ICASSP 2016).
- install all the dependencies (see the list below)
- set your environment variables by calling
source env.sh
Then, please proceed to exp/wsj
for the instructins how
to replicate our results on Wall Street Journal (WSJ) dataset
(available at the Linguistic Data Consortium as LDC93S6B and LDC94S13B).
- Python packages: pykwalify, toposort, pyyaml, numpy, pandas, pyfst
- kaldi
- kaldi-python
Given that you have the dataset in HDF5 format, the models can be trained without Kaldi and PyFst
The repository contains custom modified versions of Theano, Blocks, Fuel,
picklable-itertools, Blocks-extras as [subtrees]
(http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/).
In order to ensure that these
specific versions are used, we recommend to uninstall regular installations
of these packages if you have them installed in addition to sourcing
env.sh
.
MIT