This repository contains code for our ICASSP 2017 paper, in which we explored combining acoustic and lexical features for punctuation prediction using a neural machine translation approach with a hierarchical encoder that maps frame level acoustic features into word level acoustic embeddings. Note that this repository is intended only for research purposes as we found that purely lexical neural machine translation based system trained on large amounts of text data performs better in production.
@inproceedings{klejch2017sequence,
title={Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features},
author={Klejch, Ond{\v{r}}ej and Bell, Peter and Renals, Steve},
booktitle={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2017},
organization={IEEE}
}
This repository is using Theano and Blocks and is built on top of the Blocks NMT example. In order to train and evaluate the model you will need to perform the following steps:
- Prepare train, dev and test data directories in Kaldi format and obtain an phoneme level alignment in the ctm format using some pretrained ASR system.
- Decode dev and test data using the pretrained ASR system and generate corresponding phoneme level alignment.
- Update
config.py
with correct paths and model settings:KALDI_EXP_ROOT = "Set path to your KALDI exp root." config = {} config['vocabulary'] = "%s/data/local/dict/mgb.150k.wlist" % KALDI_EXP_ROOT config['lexicon'] = create_lexicon("%s/data/local/dict/lexicon.txt" % KALDI_EXP_ROOT) config['phones'] = create_phone_dictionary_from_lexicon( "%s/data/local/dict/nonsilence_phones.txt" % KALDI_EXP_ROOT, "%s/data/local/dict/silence_phones.txt" % KALDI_EXP_ROOT) config['phones_vocab_size'] = len(config['phones']) config['punctuation_marks'] = ["<FULL_STOP>", "<COMMA>", "<QUESTION_MARK>", "<EXCLAMATION_MARK>", "<DOTS>"] config['train_data_dir'] = "%s/data/train/" % KALDI_EXP_ROOT config['train_alignment_dir'] = "%s/exp/ali_train/" % KALDI_EXP_ROOT config['dev_data_dir'] = "%s/data/dev/" % KALDI_EXP_ROOT config['dev_alignment_dir'] = "%s/exp/ali_dev/" % KALDI_EXP_ROOT config['best_asr_data_dir'] = "%s/data/dev_asr/" % KALDI_EXP_ROOT config['best_asr_alignment_dir'] = "%s/exp/ali_dev_asr/" % KALDI_EXP_ROOT config['data_dir'] = "./data"
- Prepare data files using
python prepare_data.py
. - Train the system using
python __main__.py
. - Punctuate dev data by updating the
config
section intranslate.py
and runningpython translate.py
.