/lahjoita-puhetta-resources

A collection of resources related to the Lahjoita puhetta speech corpus.

Lahjoita puhetta resources

A collection of resources related to the Lahjoita puhetta speech corpus. Described in the paper Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

The corpus on Kielipankki

Hybrid HMM/DNN ASR system

Hybrid HMM/DNN ASR system built with Kaldi, and language models:

Hybrid, semisupervised HMM/DNN ASR system

Semisupervised HMM/DNN ASR system built with Kaldi using 100h of transcribed and 1600h of untranscribed data:

AED recipe

The SpeechBrain AED recipe can be found here: https://github.com/aalto-speech/speechbrain-lahjoita-puhetta-baseline

Even if you're not familiar with SpeechBrain, the hyperparams/Full-B-50s.yaml hyperparameter file should be relatively easy to read, if you're interested in specific hyperparameter choices.

Wav2Vec2 fine-tuned with CTC

Metadata classification