NickRuiz/power-asr

Normalize the hypothesis based on the reference text

Opened this issue · 1 comments

Normally in ASR evaluations, adjudication is used to avoid penalizing minor orthographic differences between the reference and the hypothesis. While this is often down with an outside normalization file (like a GLM file in the NIST evaluations), we really want to "modify" the hypothesis to look like the reference for the sake of scoring and alignment. The hypothesis should remain as-is, but the tool should be configurable to penalize or permit minor orthographic variations -- but in all cases the alignments should be correct.

Examples:

  • Contractions: isn't => is not
  • Numbers: 50 => fifty
  • UK to US English: favour => favor

This enhancement exists in the private repository but needs to be refactored and to the public repo.

Pushed code that handles contractions and numbers. Not UK -> US English yet. Will integrate into power.py soon.