Speech2Vec Pre-Trained Vectors

The repository releases the speech embeddings learned by Speech2Vec proposed by Chung and Glass (2018). Feel free to contact me for any questions.

Introduction

Speech2Vec is a recently proposed deep neural network architecture capable of representing variable-length speech segments as real-valued, fixed-dimensional speech embeddings that capture the semantics of the segments---It can be viewed as a speech version of Word2Vec! The training of Speech2Vec borrows the metholodogy of skip-grams & CBOW from Word2Vec and is thus unsupervised, i.e., we do not need to know the word identity of a speech segment. Please refer to the original paper for more details.

In this repository, we release the speech embeddings of different dimensionalities learned by Speech2Vec using skip-grams as the training methodology. The model is trained on a corpus consisting of about 500 hours of speech from LibriSpeech (the clean-360 + clean-100 subsets). We also include the word embeddings learned by skip-grams Word2Vec trained on the transcript of the same speech corpus.

Links

Dim Speech2Vec Word2Vec
50 file file
100 file file
200 file file
300 file file

The following figure shows the relationship between the dimensionality of the speech/word embeddings and the performance (higher the better) on a word similarity benchmark (MTurk-771) computed using this toolkit. Again, please refer to the original paper for task descriptions.

Citation

If you use the embeddings in your work, please consider citing:

@inproceedings{chung2018speech2vec,
  title     = {Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech},
  author    = {Chung, Yu-An and Glass, James},
  booktitle = {INTERSPEECH},
  year      = {2018}
}