Speech-Rule-Engine/speech-rule-engine

[skeleton] empheq causing no top-level speech

pkra opened this issue · 7 comments

pkra commented

Here's an attempt at a minimal example:

\begin{empheq} [left = \empheqlbrace \,]{align} b \tag{1}\end{empheq}

This generates a skeleton that starts with

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block" data-semantic-structure="(12 2 (11 3 (9 (8 (1 0) 6))))">
  <mrow data-semantic-added="true">
    <mo data-semantic-type="punctuation" data-semantic-role="dummy" data-semantic-id="10" data-semantic-parent="11" data-semantic-added="true" data-semantic-operator="punctuated" data-semantic-speech="comma">⁣</mo>
  <mrow data-semantic-added="true" data-semantic-type="punctuated" data-semantic-role="text" data-semantic-id="11" data-semantic-children="3,9" data-semantic-content="10" data-semantic-parent="12" data-semantic-owns="3 9" data-semantic-speech="StartLayout 1st Row  with Label left parenthesis 1 right parenthesis EndLabel b EndLayout"/>
...

The root element is missing speech and children, making labeling and navigation impossible.

pkra commented

FWIW, the original example was

\begin{empheq} [left = \empheqlbrace \,]{align} &\dot{Z}(t) = J_{{2n}}g^{\operatorname *{DD}}_{\mathcal{H}}(U,Z,t), & \text{in }\mathcal{T}, \cssId{texmlid30}{\tag{4.15a}}\\ &\begin{aligned} \dot{U}(t) & = (I_{{2N}}-UU^\top )(J_{{2N}}G^{\,p^*}_{\mathcal{H}}(U,Z)Z^\top {-}\\ & \qquad G^{\,p^*}_{\mathcal{H}}(U,Z)Z^\top J_{{2n}}^\top ) S(Z)^{-1}, \end{aligned} & \text{in }\mathcal{T}, \cssId{texmlid31}{\tag{4.15b}}\\ &U(t_0)Z(t_0) = U^0 Z^0,& \tag{4.15c} \end{empheq}

Below is the result I get. Note that, here the root node (id=12) is not the topmost node in the tree but an mrow element further down. The skeleton on the other hand will always be in the root note. SRE has a getSemanticRoot method in the walker_util module.

<math xmlns="http://www.w3.org/1998/Math/MathML" data-latex="\begin{empheq} [left = \empheqlbrace \,]{align} b \tag{1}\end{empheq}" display="block" data-semantic-structure="(12 2 (11 3 (9 (8 (1 0) 6))))">&gt;
  <mrow data-semantic-added="true">
    <mo data-semantic-type="punctuation" data-semantic-role="dummy" data-semantic-id="10" data-semantic-parent="11" data-semantic-added="true" data-semantic-operator="punctuated" data-semantic-speech="comma">⁣</mo>
    <mrow data-semantic-added="true" data-semantic-type="punctuated" data-semantic-role="text" data-semantic-annotation="depth:2" data-semantic-id="11" data-semantic-children="3,9" data-semantic-content="10" data-semantic-parent="1
2" data-semantic-owns="3 9" data-semantic-speech="StartLayout 1st Row  with Label left parenthesis 1 right parenthesis EndLabel b EndLayout"/>
    <mrow data-semantic-added="true" data-semantic-type="punctuated" data-semantic-role="startpunct" data-semantic-annotation="Emph:left;Emph:top;depth:1" data-semantic-id="12" data-semantic-children="2,11" data-semantic-content="2
" data-semantic-attributes="latex:\begin{empheq} [left = \empheqlbrace \,]{align} b \tag{1}\end{empheq}" data-semantic-owns="2 11" data-semantic-speech="left brace StartLayout 1st Row  with Label left parenthesis 1 right parenthesi
s EndLabel b EndLayout"/>
    <mtable displaystyle="true" columnalign="right right" columnspacing="0em " rowspacing="3pt" data-break-align="bottom" data-latex="\begin{align} b \tag{1}\end{empheq}" data-semantic-type="multiline" data-semantic-role="unknown" 
data-semantic-annotation="Emph:table;depth:3" data-semantic-id="9" data-semantic-children="8" data-semantic-parent="11" data-semantic-owns="8" data-semantic-speech="StartLayout 1st Row  with Label left parenthesis 1 right parenthes
is EndLabel b EndLayout">
      <mlabeledtr data-semantic-type="line" data-semantic-role="multiline" data-semantic-annotation="depth:4" data-semantic-id="8" data-semantic-children="6" data-semantic-content="1" data-semantic-parent="9" data-semantic-owns="1 
6" data-semantic-speech="with Label left parenthesis 1 right parenthesis EndLabel b" data-semantic-prefix="1st Row">
        <mtd id="mjx-eqn:1" data-semantic-type="cell" data-semantic-role="label" data-semantic-id="1" data-semantic-children="0" data-semantic-parent="8" data-semantic-owns="0" data-semantic-speech="left parenthesis 1 right parenth
esis" data-semantic-prefix="1st Column">
          <mtext data-latex="\text{(1)}" data-semantic-type="text" data-semantic-role="annotation" data-semantic-font="normal" data-semantic-id="0" data-semantic-parent="1" data-semantic-attributes="latex:\text{(1)}" data-semantic-
speech="left parenthesis 1 right parenthesis">(1)</mtext>
        </mtd>
        <mtd>
          <mpadded height="0" depth="0" voffset="height">
            <mpadded height="0" depth="0" voffset="-1height">
              <mo data-latex="\empheqlbrace" data-semantic-type="punctuation" data-semantic-role="openfence" data-semantic-annotation="Emph:left;depth:2" data
-semantic-id="2" data-semantic-parent="12" data-semantic-attributes="latex:\empheqlbrace" data-semantic-operator="punctuated" data-semantic-speech="left brace">{</mo>
              <mtext data-semantic-type="text" data-semantic-role="space" data-semantic-annotation="Emph:left;clearspeak:unit;depth:3" data-semantic-id="3" data-semantic-parent="11" data-semantic-speech=""> </mtext>
              <mphantom>
                <mpadded width="0">
                  <mtable displaystyle="true" columnalign="right" columnspacing="" rowspacing="3pt" data-break-align="bottom">
                    <mlabeledtr>
                      <mtd>
                        <mtext data-latex="\text{(1)}">(1)</mtext>
                      </mtd>
                      <mtd>
                        <mi data-latex="\tag{1}">b</mi>
                      </mtd>
                    </mlabeledtr>
                  </mtable>
                </mpadded>
              </mphantom>
            </mpadded>
            <mphantom>
              <mpadded width="0">
                <mtable displaystyle="true" columnalign="right" columnspacing="" rowspacing="3pt" data-break-align="bottom" align="baseline 1">
                  <mlabeledtr>
                    <mtd>
                      <mtext data-latex="\text{(1)}">(1)</mtext>
                    </mtd>
                    <mtd>
                      <mi data-latex="\tag{1}">b</mi>
                    </mtd>
                  </mlabeledtr>
                </mtable>
              </mpadded>
            </mphantom>
          </mpadded>
        </mtd>
        <mtd>
          <mi data-latex="\tag{1}" data-semantic-type="identifier" data-semantic-role="latinletter" data-semantic-font="italic" data-semantic-annotation="clearspeak:simple;depth:5" data-semantic-id="6" data-semantic-parent="8" data-semantic-attributes="latex:\tag{1}" data-semantic-speech="b">b</mi>
        </mtd>
      </mlabeledtr>
    </mtable>
  </mrow>
</math>

The reason why the semantic tree is rather unshapely is the \, space, which SRE interprets as semantically relevant. Similar to a\,b vs a\quad b where SRE would only deem the latter semantically relevant. Not sure why it does so in this case.

Compare the tree for the above expression

https://speech-rule-engine.github.io/semantic-tree-visualiser/visualise.html?110001111100%5Cbegin%7Bempheq%7D%20%5Bleft%20%3D%20%5Cempheqlbrace%5C%2C%20%5D%7Balign%7D%20b%20%5Ctag%7B1%7D%5Cend%7Bempheq%7D

to the one we get for

\begin{empheq} [left = \empheqlbrace]{align} b \tag{1}\end{empheq}

or even

\begin{empheq} [left = \empheqlbrace\;]{align} b \tag{1}\end{empheq}

where SRE interprets a case statement.

https://speech-rule-engine.github.io/semantic-tree-visualiser/visualise.html?110001111100%5Cbegin%7Bempheq%7D%20%5Bleft%20%3D%20%5Cempheqlbrace%20%5D%7Balign%7D%20b%20%5Ctag%7B1%7D%5Cend%7Bempheq%7D

Thanks for looking into this! Now I realize that the top root also doesn't provide a data-semantic-own attribute.

I've been naively querySelecting the first DOM node with data-semantic-speech and picked that as the root.

I'm guessing I should peek into the top-level data-semantic-structure to find the real root instead. Does that sound about right?

The way the SRE method is to look for the node that in the element that does have an id but no parent. I believe I use an Xpath expression for that.
But since you know that you have the structure element, which will always be in the expression's root, you can just lookup the id of the semantic root node and look for this with

querySelector(`[data-semantic-id="${id}"]`);

Thanks for clarifying. I'll adjust downstream. Thanks again for looking into this!

@zorkow not sure if I closed this prematurely. Obviously, I'd much rather have a natural DOM structure to follow - but given the current implementation of empheq, this seems like the worst example to complain about 😄