fairseq

This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147: A Compact Multilingual HuBERT Model.

Find details at: https://github.com/utter-project/fairseq/tree/main/examples/mHuBERT-147

Other resources

Pre-trained models with manifest files: https://huggingface.co/collections/utter-project/mhubert-147-models-665f1c1dea9a5601a1bfc905
Pre-processing and clustering scripts: https://github.com/utter-project/mHuBERT-147-scripts

Citing

@inproceedings{boito2024mhubert,
author={Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu},
title={{mHuBERT-147: A Compact Multilingual HuBERT Model}},
year=2024,
booktitle={Interspeech 2024},
}

Funding

This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) funded by European Union’s Horizon Europe Research and Innovation programme under grant agreement number 101070631.

For more information please visit https://he-utter.eu/

Gqwert123/fairseq

fairseq

Other resources

Citing

Funding