This repository contains Speechbrain recipes to fine-tune Wav2Vec2 models on a phone classification task. Following factors were analysed:
- Fine-tuning Wav2Vec2,
- Pre-training datasets,
- Model size,
- fine-tuning datasets.
Results of this work have been published at the Interspeech 2024 conference.
- The
recipes
folder contains all Speechbrain recipes. - Results obtained are available in the
confusion-matrix/
folder.
For confidentiality reasons, datasets are not included. This work relies on the C2SI, CommonPhone and BREF corpora.
If you use this work, please cite as:
@inproceedings{maisonneuve24,
author = {Malo Maisonneuve and Corinne Fredouille and Muriel Lalain and Alain Ghio and Virginie Woisard},
title = {{Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models}},
year = 2024,
booktitle = {Proc. Interspeech 2024}
}