This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.
This utility can calculate following metrics -
- Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
- Levenshtein Distance calculated at word level.
- Number of Word level insertions, deletions and mismatches between the original file and the generated file.
- Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
- Color Highlighted text Comparison to visualize the differences.
- General Statistics about the original and generated files (bytes, characters, words, new lines etc.)
The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -
- Remove Speaker Name: Remove Speaker name at the beginning of the line.
- Remove Annotations: Remove any custom annotations added during transcriptions.
- Remove Whitespaces: Remove any extra white spaces.
- Remove Quotes: Remove any double quotes
- Remove Dashes: Remove any dashes
- Remove Punctuations: Remove any punctuations (.,?!)
- Convert contents to lower case
Make sure that you have NodeJS v8+ installed on your system.
npm install -g speech-recognition-evaluation
Verify installation by simply running:
asr-eval
Simplest way to run your first evaluation is by simply passing original
and generated
options to asr-eval
command.
Where, original
is a plain text file containing original transcript to be used as reference; usually this is generated by human beings.
And generated
is a plain text file containing generated transcript by the STT/ASR system.
asr-eval --original ./original-file.txt --generated ./generated-file.txt
This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:
Word Error Rate (WER): 13.61350109561817%
To find more information about all the available options:
asr-eval --help
All the available usage options would be printed:
Synopsis
$ asr-eval --original file --generated file
$ asr-eval [options] --original file --generated file
$ asr-eval --help
Options
-o, --original file Original File to be used as reference. Usually, this should be the
transcribed file by a Human being.
-g, --generated file File with the output generated by Speech Recognition System.
-e, --wer Default: true. Print Word Error Rate (WER).
--distance Default: false. Print total word distance after comparison.
-e, --stats Default: false. Print statistics about original and generate files, before
and after pre-processing. Also prints statistics about word level and phrase
level differences.
--pairs Default: false. Print all the difference pairs with type of difference.
-c, --textcomparison Default: false. Print the text comparison between two files with
highlighting.
-s, --removespeakers Default: true. Remove the speaker at the start of each line in files before
calculations. The speaker should be separated by colon ":" i.e. speaker_name:
text For e.g. "John Doe: Hello, I am John." would get converted to simply
"Hello, I am John."
-a, --removeannotations Default: true. Remove any custom annotations in the transcript before
calculations. This is useful when removing custom annotations done by human
transcribers. Anything in square brackets [] are detected as annotations.
For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get
converted to "Hello, I am because of few reasons."
-w, --removewhitespaces Default: true. Remove any extra white spaces before calculations.
-q, --removequotes Default: true. Remove any double quotes '"' from the files before
calculations.
-d, --removedashes Default: true. Remove any dashes (hyphens) "-" from the files before
calculations.
-p, --removepunctuations Default: true. Remove any punctuations ".,?!" from the files before
calculations.
-l, --lowercase Default: true. Convert both files to lower case before calculations. This is
useful if evaluation needs to be done in case-insensitive way.
-h, --help Print this usage guide.