Normalize the hypothesis based on the reference text

Question

Normalize the hypothesis based on the reference text

Opened this issue 6 years ago · 1 comments

Normally in ASR evaluations, adjudication is used to avoid penalizing minor orthographic differences between the reference and the hypothesis. While this is often down with an outside normalization file (like a GLM file in the NIST evaluations), we really want to "modify" the hypothesis to look like the reference for the sake of scoring and alignment. The hypothesis should remain as-is, but the tool should be configurable to penalize or permit minor orthographic variations -- but in all cases the alignments should be correct.

Examples:

Contractions: isn't => is not
Numbers: 50 => fifty
UK to US English: favour => favor

This enhancement exists in the private repository but needs to be refactored and to the public repo.

Answer 1 · 2019-05-03T19:54:28.000Z

Pushed code that handles contractions and numbers. Not UK -> US English yet. Will integrate into power.py soon.