jgm/texmath

\mhchem conversion: MathML to OOML/MathML equation leads to additional width in equation

frederik opened this issue · 4 comments

Hello,

I am currently trying to export \mhchem equations to DocX using Pandoc. Since \mhchem is not supported natively, I transformed the TeX to MathML and then tried to convert the MathML to OOML for Word.

Using https://johnmacfarlane.net/texmath.html I could see that even in the MathML to MathML conversion the zero-width mpadded element is dropped. This would explain the additional horizontal space in the screenshot.

comparison

Looking at the OOML that Word produces directly (I am attaching the DocX file) I can see that Word added a <m:zeroWid m:val="1"/> element whereas in the the <m:phant> element in the OOML produced by TeXMath does not have a zero-width thus pushing the 2 down correctly but introducing white space.

Source Equation:

\ce{ H2O }

Resulting MathML (using MathJax):

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mrow data-mjx-texclass="ORD">
    <mrow data-mjx-texclass="ORD">
      <mi data-mjx-auto-op="false" mathvariant="normal">H</mi>
    </mrow>
    <msub>
      <mrow data-mjx-texclass="ORD">
        <mrow data-mjx-texclass="ORD">
          <mpadded width="0">
            <mphantom>
              <mi>A</mi>
            </mphantom>
          </mpadded>
        </mrow>
      </mrow>
      <mrow data-mjx-texclass="ORD">
        <mrow data-mjx-texclass="ORD">
          <mpadded height="0">
            <mn>2</mn>
          </mpadded>
        </mrow>
      </mrow>
    </msub>
    <mrow data-mjx-texclass="ORD">
      <mi data-mjx-auto-op="false" mathvariant="normal">O</mi>
    </mrow>
  </mrow>
</math>

DocX file as produced by Word:
HA2O.docx

I think that using MathML directly could open up a whole other world of equations for researchers, since we would not depend on all packages being available through Pandoc directly. I'd be happy to help and to provide further examples - unfortunately, I do not understand the code base enough to provide a fix or analysis.

Kindly
Frederik

jgm commented

Wow, this is quite elaborate MathML for something so simple! Why not

<math display="block" xmlns="http://www.w3.org/1998/Math/MathML">
  <mstyle mathvariant="normal">
    <msub>
      <mi>H</mi>
      <mn>2</mn>
    </msub>
    <mi>O</mi>
  </mstyle>
</math>

Our MathML reader does not yet implement mpadded, which is one problem.

The other is the treatment of mathvariant="normal". It should produce roman, not italics.
Not sure yet if it's a MathML reader or OOXML writer issue or both.

PS. Have you tried my my mhchem Lua filter? That may be a simpler approach.

Yes, I had a look, but this would add support for one package, whereas if we could support generic MathML we'd reach more use cases (and would have 100% compatibility between what users see as an SVG in the editor and what is exported with their paper - at least in theory).

MathML is a supported format for a number of discipline specific editor. It might also allow us to take complex TeX that uses a lot of packages and condense it to a publication format that can be used in PD, EPUB, or websites.

Back then I followed an issue on mhchem and for some reason concluded that support would also only be partial. I will take another look.

The complex MathML is unfortunately the way that MathJax 3 produces it. But yours of course is a lot nicer.

How do you feel about adding support for mpadded? Is this something you see being added here? I will definitely take another look at the Lua filter in any case.

jgm commented

mpadded support would probably require adding something new to the types for equations.
And then the trick would be figuring out equivalents in all the other formats. I'm really not sure what the TeX equivalent would be, for example. But suppose we did have a \padded command that did the same thing. You wouldn't WANT the MathML you give above to be converted to something like

H\phantom{\padded[0pt]{A}}_2 O

but rather to

H_2 O

Currently we handle mpadded by ignoring it and just processing what's inside. We could add a special case that checks to see if we have a width of 0 specified, in which case we could ignore the contents and insert a zero-width space, or nothing. That wouldn't be general support for mpadded, but it would handle this specific case.

jgm commented

The font issue is related to #149. MathML has no way to specify "upright" or "roman" font; it just has an option for "no special font adjustment." Maybe we should always use roman style for mathvariant="normal". But that would violate the documentation's expectation that <mi mathvariant="normal"> is the same as plain <mi> (i.e. the default).