/open-biasing

Context-biasing test sets for popular open-sourced ASR dataset.

Primary LanguagePythonApache License 2.0Apache-2.0

open-baising

open-biasing is a collection of context-biasing test sets for popular open-sourced ASR datasets. For each dataset, there is a context-biasing list and the corresponding utterance ids that contain these context phrases. The utterances are a subset of the original test set, so the original test set is split into two parts, one is the contexts part which is expected to get significant improvement with a context-biasing algorithm, the other is the part without contexts which is expected to not be affected by the context-biasing stuff.

How we create these test sets

Benchmark

Aishell

LibriSpeech

WenetSpeech

GigaSpeech