This is the implementation of the paper Gate Activation Signal Analysis for Gated Recurrent Neural Networks and Its Correlation with Phoneme Boundaries. The paper is presented in Interspeech 2017. The slide of oral presentation in the conference is also included. Here is the link of the paper on arXiv.
Our corpus is TIMIT. We use tensorflow 1.1.0 (with CUDA 8.0) with python3.5 for our implementation.
- tensorflow 1.1.0 GPU-enabled version
- numpy
- sklearn
- Kaldi
- matplotlib (for plotting curves)
Prepare data with your own timit corpus with prepare_timit:
- Change directory to prepare_timit
- Specify timit_root, feat_loc and kaldi_root in
- Execute to extract training set and testing set MFCC features, the resulting scp files will be in timit_gas/feature_scp, timit_gas/bounds
Then execute ./ to run the code. General settings
- rnn_type: specify use gru or lstm
- dropout_keep: specify dropout rate
- gpu_device: specify gpu device id to be used
- tol_window: specify the tolerance window size, 2 for 20 ms Training
- scp_file: training data scp
- dev_scp_file: development data scp
- max_len: the max length of data for unrolling RNN. For TIMIT, the max len is 777 frames.
- min_epoch: minimum epochs required for training Testing
- test_scp_file: testing data scp
- test_bound_file: the bounds of testing data
- max_len: the max length of data for unrolling RNN. For TIMIT, the max len is 777 frames.
- enable_plt: if it is true, plot the curves of GAS and phoneme boundaries of a specified utterance.
- th: thresholds for segmentation
tensor name | GRU | LSTM |
encoder_g1 | update gate | forget gate |
encoder_g2 | reset gate | input gate |
encoder_g3 | X | output gate |
The encoder_g3 tensor for GRU will output all zeros.
Bibilographic inforamtion for this work:
author={Yu-Hsuan Wang and Cheng-Tao Chung and Hung-Yi Lee},
title={Gate Activation Signal Analysis for Gated Recurrent Neural Networks and its Correlation with Phoneme Boundaries},
booktitle={Proc. Interspeech 2017},