/speech-recognition-evaluation

Evaluate results from ASR/Speech-to-Text quickly

Primary LanguageJavaScriptApache License 2.0Apache-2.0

Automatic Speech Recognition (ASR) Evaluation

This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.

This utility can calculate following metrics -

  • Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
  • Levenshtein Distance calculated at word level.
  • Number of Word level insertions, deletions and mismatches between the original file and the generated file.
  • Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
  • Color Highlighted text Comparison to visualize the differences.
  • General Statistics about the original and generated files (bytes, characters, words, new lines etc.)

The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -

  • Remove Speaker Name: Remove Speaker name at the beginning of the line.
  • Remove Annotations: Remove any custom annotations added during transcriptions.
  • Remove Whitespaces: Remove any extra white spaces.
  • Remove Quotes: Remove any double quotes
  • Remove Dashes: Remove any dashes
  • Remove Punctuations: Remove any punctuations (.,?!)
  • Convert contents to lower case

Pre-requisites

Make sure that you have NodeJS v8+ installed on your system.

Installation

npm install -g speech-recognition-evaluation

Verify installation by simply running:

asr-eval

Usage

Simplest way to run your first evaluation is by simply passing original and generated options to asr-eval command. Where, original is a plain text file containing original transcript to be used as reference; usually this is generated by human beings. And generated is a plain text file containing generated transcript by the STT/ASR system.

asr-eval --original ./original-file.txt --generated ./generated-file.txt

This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:

Word Error Rate (WER): 13.61350109561817%

To find more information about all the available options:

asr-eval --help

All the available usage options would be printed:

Synopsis

  $ asr-eval --original file --generated file           
  $ asr-eval [options] --original file --generated file 
  $ asr-eval --help                                     

Options

  -o, --original file         Original File to be used as reference. Usually, this should be the            
                              transcribed file by a Human being.                                            
  -g, --generated file        File with the output generated by Speech Recognition System.                  
  -e, --wer                   Default: true. Print Word Error Rate (WER).                                   
  --distance                  Default: false. Print total word distance after comparison.                   
  -e, --stats                 Default: false. Print statistics about original and generate files, before    
                              and after pre-processing. Also prints statistics about word level and phrase  
                              level differences.                                                            
  --pairs                     Default: false. Print all the difference pairs with type of difference.       
  -c, --textcomparison        Default: false. Print the text comparison between two files with              
                              highlighting.                                                                 
  -s, --removespeakers        Default: true. Remove the speaker at the start of each line in files before   
                              calculations. The speaker should be separated by colon ":" i.e. speaker_name: 
                              text For e.g. "John Doe: Hello, I am John." would get converted to simply     
                              "Hello, I am John."                                                           
  -a, --removeannotations     Default: true. Remove any custom annotations in the transcript before         
                              calculations. This is useful when removing custom annotations done by human   
                              transcribers.  Anything in square brackets [] are detected as annotations.    
                              For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get    
                              converted to "Hello, I am because of few reasons."                            
  -w, --removewhitespaces     Default: true. Remove any extra white spaces before calculations.             
  -q, --removequotes          Default: true. Remove any double quotes '"' from the files before             
                              calculations.                                                                 
  -d, --removedashes          Default: true. Remove any dashes (hyphens) "-" from the files before          
                              calculations.                                                                 
  -p, --removepunctuations    Default: true. Remove any punctuations ".,?!" from the files before           
                              calculations.                                                                 
  -l, --lowercase             Default: true. Convert both files to lower case before calculations. This is  
                              useful if evaluation needs to be done in case-insensitive way.                
  -h, --help                  Print this usage guide.