For training networks for ASR. Contains a Network
class that's an OO implementation of tensorflow. Example program below.
This class uses Tensorflow to create a Neural Network. Trained networks can be saved and restored.
Assumes training data is in hdf5 file for efficient memory usage.
(For kaldi) Can, given a file of utterance IDs with their feature vectors, output result into a file that kaldi can
process (specifically latgen-faster-mapped
from nnet1).
Inside python use help(Network)
to bring up the help text showing available functions. Summary of implementation:
Fixed:
Elu activation function, Adam SGD, weight init using `tf.truncated_normal(stddev=1/(input*1.41))` (bias=0), softmax.
Variable:
Constant learning rate or adaptive, dropout, L2 regularization.
A lot of smaller features (as in useful functions etc.) are also there.
Possibility to split network into separate streams (that then join) is being worked on (was pretty much done, fallen by the wayside for the moment).
from NN import Network
data_train = '....hdf5'
data_val = '....hdf5'
pretrain_param_save_file = '...'
pretrain_params_dict = {'data_train': data_train, 'data_val': data_val,
'epochs': 20, 'batch_size': 1024,
'eta': 1e-4, 'kp_prob': 0.8,
'lam': 0.1, 'save_file': pretrain_param_save_file}
NN = Network([520, 1024, 1024, 1024, 1024, 1024, 1024, 2000], pretrain=True,
pretrain_params_dict=pretrain_params_dict, fraction_of_gpu=0.5)
# Training cost, score and validation score output every epoch.
NN.train(data_train, 20, 1024, 1e-4, val_file=data_val, lam=0.1, kp_prob=0.8)
model_save_file = '...'
NN.save(model_save_file)
NN.stop()
Tried to understand RBMs, didn't, don't like using stuff I don't understand, so I thought maybe I could try pretrain a network using other methods, such as used in 1 or 2
Results so far have been disappointing. Using a kaldi network of the same size and DBN pretraining achieves an 11-12% WER, I'm still trying to get past 16. I believe the underlying problem is I've got 10 hours of training data (aurora4), the linked papers have an order of magnitude more; per 3 the more data you have the less pretraining matters. Meaning it's hard for me to replicate the papers acheiving close to DBN-level results with discriminative pretraining.
Training only works with data that is formatted correctly. The advantage is low RAM usage.
It's not hard to do, but to make it as simple as possible here's an example. It is assumed that the features and targets are numpy arrays in f
and t
.
import h5py
# This is where we want out data to end up in.
data_fname = '...'
new_data_file = h5py.File(data_fname, 'w')
# Note data must use these keys/names.
new_data_file.create_dataset('feats', data=f)
new_data_file.create_dataset('targs', data=t)
new_data_file.close()
# That's it !
Python 2.7
Tensorflow, h5py, numpy, subprocess, re, datetime