NickRuiz/power-asr

Digits not handled with phoneme pronunciation

Closed this issue · 2 comments

Currently the lexicon-based pronouncer (from CMUDict) can't handle digits. As a result, the digit words are just being converted to digit characters, which messes up the phonetic realignment.

I added a test case in tests/test_aligner.py: test_power_ISS_to_DS that outlines the problem.

REF:  A      50-year-old  business  man
HYP:  fifty  year old     business  man
Eval: S      S            C         C

The correct alignment should be:

REF:  A      50-year-old     business  man
HYP:         fifty year old  business  man
Eval: D      S               C         C

Added normalization code in 90d1add. Will update pronounce.py to use it for digit sequences.

Fixed this test