GAN_mapping_relationship
This is the implementation of our paper. In this paper, we proposed an unsupervised phoneme recogntion system which can achieve 36% phoneme accuracy on TIMIT with oracle phone boundaries. This method developed a GAN-based model to achieve unsupervised phoneme recognition.
How to use
Dependencies
-
tensorflow 1.13
-
kaldi
-
librosa
Data preprocess
- Usage:
- Modify
path.sh
with your path of Kaldi. - Modify
config.sh
with your feature path and timit path. - Run
$ bash preprocess.sh
- Phoneme sequences can download from here, and put
fake.39
andoracle.39
in./data
.
Train model
- Usage:
- Modify the experimental and path setting in
config.sh
. - Modify the model's parameter in
src/audio2vec.sh
andsrc/mapping.sh
. - Run
$ bash run.sh
- This scipt contains the training flow of the whole system.
config.sh
Hyperparameters in cluster_num
: number of cluster.
target_type
: type of phoneme sequences (oracle/fake).
src/audio2vec.sh
Hyperparameters in mode
: train or test mode (train/test), test mode is the step of clustering.
lr
: learning rate.
max_length
: max length of acoustic token.
hidden_units
: hidden size of the audio2vec.
batch_size
: batch size.
epoch
: number of training epoch.
kl_saturate
: parameter of kl-annealing.
kl_step
: parameter of kl-annealing, which means how many step of KL-weight from 0 to 1.
cuda_id
: GPU ids.
src/mapping.sh
Hyperparameters in mode
: train or test mode (train/test).
generator_lr
: learning rate of generator.
discriminator_lr
: learning rate of discriminator.
max_length
: max length of phoneme sequence.
step
: number of training step.
discriminator_hidden_units
: hidden size of the discriminator.
discriminator_iterations
: training iteration of discriminator.
batch_size
: batch size.
cuda_id
: GPU ids.
Reference
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings, Da-Rong Liu, Kuan-Yu Chen et.al.
Acknowledgement
Special thanks to Da-Rong Liu !