Create doc listing all the important char with possible speech for the character.
NSoiffer opened this issue · 5 comments
This is start of an agreed to resolution of #480.
Because the experience for AT users of not hearing anything other than possibly a hex number for a character is really poor, we agreed the Math WG should come up large list that maps a very inclusive set of characters that might be used for speech to some potential speech. This speech is not a concept name and so does not belong in the concept lists. For example ↻
(U+21BB) might have the listed potential speech "clockwise open circle arrow". Often, these names are based on the Unicode description.
The path forward is:
- Neil will verify his MathCAT list of 4000+ characters includes all the Unicode chars with math properties
- Neil will pass that on to @davidcarlisle for adding to his unicode.xml list (used for XML Entities rec).
- David will then produce a draft W3C note or some other document for reference by AT vendors.
The initial idea was to add to unicode.xml
but there are several conditional templates which don't really fit the existing unicode.xml
style and it seems more in the spirit of the intent concept and property lists to manage the source data as YAML, which is also the format of the MathCAT list.
A slightly modified version of the MathCat list
https://github.com/NSoiffer/MathCAT/blob/main/Rules/Languages/en/unicode-full.yaml
has been added to mathml-docs as
https://github.com/w3c/mathml-docs/blob/main/_data/unicode-speech.yml
With a github pages rendering
https://w3c.github.io/mathml-docs/unicode-speech/
There are some minor restructuring of the YAML to aid rendering in jekyll but the substantive changes are:
- All Private Use Area characters dropped.
- the
pitch:
field has been dropped (only occured twice after dropping PUA characters)
Some of the nested conditions are not fully handled by the github template (and show the raw object data as ... => ...
) It may make more sense to simplify the conditions
Possible changes:
- drop the CJK Compatibillity block which starts at
https://w3c.github.io/mathml-docs/unicode-speech/#U3371 - re-order to be in Unicode order
- Break up the math alphabetic ranges not to incllude the "holes" for pre-existing base plane characters
- drop some more fields such as
audio:
- simplify some of spell/translate markup
- simplify some of the pseudo xpath conditions (especially when they query custom MathCat elements rather than MathML)
- Document the external parameters used eg
"$SpeechStyle != 'ClearSpeak'
(and probably don't use so many if they are more closely tied to the mathcat implementation)
@NSoiffer I checked in a second version with much simpler handling of conditional textx, all nested tests and xpath and other tests are replaced by (aribitrarily named) states so the yaml still records all possible suggested speech strings but the detailed mechanism to choose between them is left to implementations.
so
- test:
if: ancestor::m:modified-variable and preceding-sibling::*[1][self::m:mi]
then:
- t: bar
else:
- t: line
becomes
- choose:
- modified-variable: bar
- default: line
and
- test:
if: $SpeechStyle != 'ClearSpeak' or $ClearSpeak_MultSymbolDot = 'Auto'
then:
- t: times
else:
- t: dot
becomes
- choose:
- dot-times: times
- default: dot
and
- test:
if: $SpeechStyle != 'ClearSpeak'
then:
- t: an element of
else_test:
if: ../../self::m:set or ../../../self::m:set
then_test:
- if: $ClearSpeak_SetMemberSymbol = 'Auto' or $ClearSpeak_SetMemberSymbol = 'In'
then:
- t: in
- else_if: $ClearSpeak_SetMemberSymbol = 'Member'
then:
- t: member of
- else_if: $ClearSpeak_SetMemberSymbol = 'Element'
then:
- t: element of
- else:
- t: belonging to
else_test:
- if: $ClearSpeak_SetMemberSymbol = 'Auto' or $ClearSpeak_SetMemberSymbol =
'Member'
then:
- t: is a member of
- else_if: $ClearSpeak_SetMemberSymbol = 'Element'
then:
- t: is an element of
- else_if: $ClearSpeak_SetMemberSymbol = 'In'
then:
- t: is in
- else:
- t: belongs to
becomes
- choose:
- element-member-verbose: is a member of
- element-member: member of
- element-belonging: belonging to
- element-belongs: belongs to
- element-in-verbose: is in
- element-in: in
- element-verbose: is an element of
- default: an element of
YAML
https://github.com/w3c/mathml-docs/blob/main/_data/unicode-speech2.yml
HTML rendering
https://w3c.github.io/mathml-docs/unicode-speech/index2.html
Currently named as ...2
to allow side by side comparison in gh-pages view.
I like those changes, and also using "map". It is much more readable. Let's discuss this with the rest of the group at the start of the meeting on Thursday.
the index2
version has now been implemented as
https://w3c.github.io/mathml-docs/unicode-speech
and teh above URL is no longer active. The exact list of characters can be edited as can their speech strings but the basic mechanism is in place with a public document, so closing here.