HTK is used to create MFCC features. To compile it, run ./compile_htk.sh
Run ./link_timit.sh
to create a symbolic link for TIMIT dataset under /results
.
debug_mfcc
has files for comparing MFCC features generated by HTK and that by python_speech_features
.
Only one file is used to do this test.
It's copied from TIMITcorpus/TIMIT/TRAIN/DR8/FBCG1/SX442.*
, where *
are PHN
, TXT
, WAV
, WRD
.
.wav.sox
is generated using sox SX442.WAV SX442.wav
,
as seen in /home/zhihaol/807/scripts/convert_wav.sh
on CNBC cluster, and I renamed it to .wav.sox
.
I will generate features according to Section 5.1 of Supervised Sequence Labelling with Recurrent Neural Networks, a preprint version is at http://www.cs.toronto.edu/~graves/preprint.pdf.
It's based on 3.1.5 "Step 5 - Coding the Data" of HTK official book (htkbook-3.5.alpha-1.pdf
)
It's done using https://github.com/cmusphinx/sphinxtrain/blob/master/python/cmusphinx/htkmfc.py
Run PYTHONPATH=$(pwd) python feature_generation/generate_mfcc_features.py
under project root to see the results.
It will generate /results/features/TIMIT_train.hdf5
and /results/features/TIMIT_test.hdf5
,
as well as a folder /results/features/TIMIT_train
for training data of ctc_example
.
It's from https://github.com/dresen/tensorflow_CTC_example.
I modified INPUT_PATH
& TARGET_PATH
, and minibatch size to 10 in bdlstm_train.py
,
and load_batched_data
in utils.py
.
Run PYTHONPATH=$(pwd) python ctc_example/bdlstm_train.py
under project root to see the results.
By 10 epochs, it should give error rate of about 0.4.
It's from http://file.ppwwyyxx.com/