Speech-Rule-Engine/speech-rule-engine

Space as thousands separator in numbers

limefrogyank opened this issue · 1 comments

I've done some testing and thought I would leave this issue here.

First, this is from the International System of Units spec concerning separating digits with spaces (emphasis mine):

"The practice of grouping digits in this way is a matter of choice; it is not always followed in certain specialized applications such as engineering drawings, financial statements and scripts to be read by a computer."

The International System of Units (PDF) (9th ed.). International Bureau of Weights and Measures. 2019. p. 150. ISBN 978-92-822-2272-0.

I take this to mean that the spacing has no meaning and is only to make it easier to read at a glance. Spoken numbers have natural separators like "thousand" and "million".

However, adding spaces to the number using Unicode character x2009 (slimspace) causes speech-rule-engine to add spaces to the number causing a number like 12345 (which looks like 12 345) to be read as "twelve three hundred and forty-five" instead of "twelve thousand three hundred and forty-five". Adding literal commas in place of the spaces will generate the correct reading, but using commas is not correct according to the SI rules.

This happens when:

  • MathJax parses LaTeX: 12\,345
  • MathML is generated manually (\u2009 is unicode slimspace):
    <mn>12\u2009345</mn>
  • Alternative MathML:
    <mrow>
        <mn>12</mn>
        <mo separator='true'>\u2009</mo>
        <mn>345</mn>
    </mrow>
  • Alternative MathML v2:
    <mrow>
        <mn>12</mn>
        <mspace width="thinspace" />
        <mn>345</mn>
    </mrow>

Pure MathML using <mn>12\u2009345</mn> is the best because this does not generate any <mo> multiplication in MathSpeak. However, it is still not read properly in ClearSpeak.

I'm not sure what else I can try, but it seems that ideally we would have spaces between numbers in plain <mn> tags be treated the same way commas are treated.

As for a solution, I am generating the MathML directly and can explicitly add an attribute (data-number-separator = "\u2009" ??) easily. If there's a better way that's less of a bandaid, I'm happy to implement it on my side.