/rtasr

🏆 Run benchmarks against the most common ASR tools on the market.

Primary LanguagePythonMIT LicenseMIT

Rate That ASR (RTASR)

🏆 Run benchmarks against the most common ASR tools on the market.


Results

Important

Deepgram benchmark results have been updated with the latest Nova 2 model.

WER & WRR

wer = Word Error Rate, mer = Match Error Rate, wil = Word Information Lost, wrr = Word Recognition Rate

WER evaluation

WRR evaluation WER evaluation

DER

der = Diarization Error Rate, miss = missed detection, confusion = incorrect detection, fa = false alarm

Note

Click on the images to get a bigger display.

DER graph evaluation DER table evaluation

Note

Click on the images to get a bigger display.

DER graph evaluation DER table evaluation

Installation

Last stable version

pip install rtasr

From source

git clone https://github.com/Wordcab/rtasr
cd rtasr

pip install .

Commands

The CLI is available through the rtasr command.

rtasr --help

List datasets, metrics and providers

# List everything
rtasr list
# List only datasets
rtasr list -t datasets
# List only metrics
rtasr list -t metrics
# List only providers
rtasr list -t providers

Datasets download

Available datasets are:

rtasr download -d <dataset>

ASR Transcription

Providers

Implemented ASR providers are:

Run transcription

Run ASR transcription on a given dataset with a given provider.

rtasr transcription -d <dataset> -p <provider>

Multiple providers

You can specify as many providers as you want:

rtasr transcription -d <dataset> -p <provider1> <provider2> <provider3> ...

Choose dataset split

You can specify the dataset split to use:

rtasr transcription -d <dataset> -p <provider> -s <split>

If not specified, all the available splits will be used.

Caching

By default, the transcription results are cached in the ~/.cache/rtasr/transcription directory for each provider.

If you don't want to use the cache, use the --no-cache flag.

rtasr transcription -d <dataset> -p <provider> --no-cache

Note: the cache is used to avoid running the same file twice. By removing the cache, you will run the transcription on the whole dataset again. We aren't responsible for any extra costs.

Debug mode

Use the --debug flag to run only one file by split for each provider.

rtasr transcription -d <dataset> -p <provider> --debug

Evaluation

The evaluation command allows you to run an evaluation on the transcription results.

If you don't specify the split, the evaluation will be run on the whole dataset.

Run DER evaluation

rtasr evaluation -m der -d <dataset> -s <split>

Run WER evaluation

rtasr evaluation -m wer -d <dataset> -s <split>

Plot results

To get the plots of the evaluation results, use the plot command.

If you don't specify the split, the plots will be generated for all the available splits.

Plot DER results

rtasr plot -m der -d <dataset> -s <split>

Plot WER results

rtasr plot -m wer -d <dataset> -s <split>

Dataset length

To get the total length of a dataset, use the audio-length command. This command allow you to get the number of minutes of audio for each split of a dataset.

If you don't specify the split, the total length of the dataset will be returned for all the available splits.

rtasr audio-length -d <dataset> -s <split>

Contributing

Be sure to have hatch installed.

Quality

  • Run quality checks: hatch run quality:check
  • Run quality formatting: hatch run quality:format

Testing

  • Run tests: hatch run tests:run