Fine-tuning wav2vec2-dutch-large and wav2vec2-large on the 3 hours training set of the CGN corpus.
Hyperparameters are tuned on the 1 hour development set, and models are evaluated on the 1 hour test set.
The fine-tuning and evaluation scripts are in:
./scripts
Models are available on HuggingFace (vocab is similar to wav2vec2-dutch-large-ft-cgn):
Initial experiments showed best performance using:
- Batch size: 32 (batch size 16, gradient accumulation steps 2)
- Warmup: min(500, 10% of total optimisation steps)
We explored different learning rates in the range [1e-6, 1e-4], and trained for 200 epochs. Some results on the development set are visualized below. Ideally, we want the same set of parameters for both models.
We evaluated those models (sharing the same hyperparameters) on the test set that performed best on the development set.
Model | LR | WER | CER |
---|---|---|---|
wav2vec2-dutch-large-cgn | 1e-4 | 0.15 | 0.04 |
wav2vec2-large-cgn | 1e-4 | 0.30 | 0.08 |