ASR Scripts

This project aims to simplify using Kaldi for speech recognition and alignment. It currently works with the ASpIRE pre-trained model, although the scripts can be extended easily to work with different/custom trained models.

Installation

Prerequisites

Compiled Kaldi instance (instructions)
ASpIRE chain pre-trained model (download, preparation)
For displaying the TextGrid alignment files, you will need to install praat.
For generating TextGrid alignment files, you will need to install the python package for praatIO.

Download scripts

$ git clone https://github.com/jailuthra/asr
Place the scripts in kaldi/egs/aspire/s5 directory.

Input audio constraints

Mono PCM wave files, 16-bit sample size, 8KHz sampling rate.

Scripts

aspire.py: Decodes and aligns the wav files using the pre-trained model, calls the other scripts
filegen.py: Generates reqd. speaker-id, utterance-id information files using the wav files
id2phone.py, id2word.py: Convert phone/word ids in ctm output, to actual phones/words
ctm2tg.py: Convert ctm output to Praat TextGrid files

Usage

Create a directory with all your wav files.
File naming convention is <speaker_id>_<utterance_id>.wav for example 0001_0001.wav, 0001_0002.wav.
Call the aspire script: ./aspire.py <wavdir> <outputdir>.
It will generate text transcriptions and alignment files in the output directory.