Number phonetizer yields unwieldy rules
Opened this issue · 1 comments
jorio commented
String a couple numbers together and MAX_TRANSITIONS gets shattered instantly.
I've bumped MAX_TRANSITIONS to 64 for now, but the rules are needlessly complex.
For example:
51:
( ( s in | s in k ) | k in z [ swa ] | s in k an t [ swa ] )
[ ( ( in [ n ] | in [ n ] ) | y n [ swa ] | d i s | s an [ t ] |
m i l [ swa ] | on z [ swa ] ) ]
(Which is incorrect, too -- "et" is missing)
10:
[ ( ( in [ n ] | in [ n ] ) | y n [ swa ] | d i s | s an [ t ] |
m i l [ swa ] | on z [ swa ] ) ] [ z e R oh ]
jorio commented
We could write a new number converter from scratch or we could use an existing solution such as RBNF.
http://unicode.org/repos/cldr/trunk/common/rbnf/fr.xml
http://userguide.icu-project.org/formatparse/numbers/rbnf-examples
http://icu-project.org/apiref/icu4j/com/ibm/icu/text/RuleBasedNumberFormat.html