jgm/texmath

please use █ instead of ■ when converting {aligned} into docx

ZhuangQu opened this issue · 6 comments

I use pandoc 3.1.1 in Windows11. When converting

\begin{equation*}
    \begin{aligned}
        1= & 2 &  & 3 \\
        =  & 4 &  & 5 \\
    \end{aligned}
\end{equation*}

from LaTeX into docx, we get

■(1=&2&&3@=&4&&5) 

in Word. we can see that you convert {aligned} to ■, which is wrong. The correct output is █.
In UnicodeMath, ■ U+25A0 represents a matrix, █ U+2588 represents an aligned structure.

jgm commented

Transferring to jgm/texmath which does our math conversion.

Note: we don't use UnicodeMath; we use Word's XML representation of math.
The above aligned environment is translated as

<m:oMathPara>
  <m:oMathParaPr>
    <m:jc m:val="center" />
  </m:oMathParaPr>
  <m:oMath>
    <m:m>
      <m:mPr>
        <m:baseJc m:val="center" />
        <m:plcHide m:val="1" />
        <m:mcs>
          <m:mc>
            <m:mcPr>
              <m:mcJc m:val="right" />
              <m:count m:val="1" />
            </m:mcPr>
          </m:mc>
          <m:mc>
            <m:mcPr>
              <m:mcJc m:val="left" />
              <m:count m:val="1" />
            </m:mcPr>
          </m:mc>
          <m:mc>
            <m:mcPr>
              <m:mcJc m:val="right" />
              <m:count m:val="1" />
            </m:mcPr>
          </m:mc>
          <m:mc>
            <m:mcPr>
              <m:mcJc m:val="left" />
              <m:count m:val="1" />
            </m:mcPr>
          </m:mc>
        </m:mcs>
      </m:mPr>
      <m:mr>
        <m:e>
          <m:r>
            <m:t>1</m:t>
          </m:r>
          <m:r>
            <m:rPr>
              <m:sty m:val="p" />
            </m:rPr>
            <m:t>=</m:t>
          </m:r>
        </m:e>
        <m:e>
          <m:r>
            <m:t>2</m:t>
          </m:r>
        </m:e>
        <m:e />
        <m:e>
          <m:r>
            <m:t>3</m:t>
          </m:r>
        </m:e>
      </m:mr>
      <m:mr>
        <m:e>
          <m:r>
            <m:rPr>
              <m:sty m:val="p" />
            </m:rPr>
            <m:t>=</m:t>
          </m:r>
        </m:e>
        <m:e>
          <m:r>
            <m:t>4</m:t>
          </m:r>
        </m:e>
        <m:e />
        <m:e>
          <m:r>
            <m:t>5</m:t>
          </m:r>
        </m:e>
      </m:mr>
    </m:m>
  </m:oMath>
</m:oMathPara>

Please suggest more appropriate OMML.

Sorry, I don't know what is OMML.
I only know that █ is correct and ■ is wrong.
Maybe you can convert UnicodeMath to OMML to get more appropriate OMML.

jgm commented

Experimenting with Word: using U+25A0, I get
Screen Shot 2023-03-16 at 11 43 31 PM

and XML

     <m:oMathPara>
        <m:oMathParaPr>
          <m:jc m:val="center" />
        </m:oMathParaPr>
        <m:oMath>
          <m:m>
            <m:mPr>
              <m:plcHide m:val="1" />
              <m:mcs>
                <m:mc>
                  <m:mcPr>
                    <m:count m:val="1" />
                    <m:mcJc m:val="right" />
                  </m:mcPr>
                </m:mc>
                <m:mc>
                  <m:mcPr>
                    <m:count m:val="1" />
                    <m:mcJc m:val="left" />
                  </m:mcPr>
                </m:mc>
                <m:mc>
                  <m:mcPr>
                    <m:count m:val="1" />
                    <m:mcJc m:val="right" />
                  </m:mcPr>
                </m:mc>
                <m:mc>
                  <m:mcPr>
                    <m:count m:val="1" />
                    <m:mcJc m:val="left" />
                  </m:mcPr>
                </m:mc>
              </m:mcs>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
              </m:ctrlPr>
            </m:mPr>
            <m:mr>
              <m:e>
                <m:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>1</m:t>
                </m:r>
                <m:r>
                  <m:rPr>
                    <m:sty m:val="p" />
                  </m:rPr>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>=</m:t>
                </m:r>
              </m:e>
              <m:e>
                <m:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>2</m:t>
                </m:r>
              </m:e>
              <m:e />
              <m:e>
                <m:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>3</m:t>
                </m:r>
              </m:e>
            </m:mr>
            <m:mr>
              <m:e>
                <m:r>
                  <m:rPr>
                    <m:sty m:val="p" />
                  </m:rPr>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>=</m:t>
                </m:r>
              </m:e>
              <m:e>
                <m:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>4</m:t>
                </m:r>
              </m:e>
              <m:e />
              <m:e>
                <m:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Cambria Math"
                    w:hAnsi="Cambria Math" />
                  </w:rPr>
                  <m:t>5</m:t>
                </m:r>
              </m:e>
            </m:mr>
          </m:m>
        </m:oMath>
      </m:oMathPara>

while with U+2588, I get
Screen Shot 2023-03-16 at 11 43 56 PM

and XML

      <m:oMathPara>
        <m:oMath>
          <m:eqArr>
            <m:eqArrPr>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" w:cs="Arial" />
                  <w:color w:val="24292F" />
                  <w:sz w:val="21" />
                  <w:szCs w:val="21" />
                  <w:shd w:val="clear" w:color="auto"
                  w:fill="FFFFFF" />
                </w:rPr>
              </m:ctrlPr>
            </m:eqArrPr>
            <m:e>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>1</m:t>
              </m:r>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>=</m:t>
              </m:r>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>&amp;</m:t>
              </m:r>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>2</m:t>
              </m:r>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>&amp;</m:t>
              </m:r>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>&amp;</m:t>
              </m:r>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>3</m:t>
              </m:r>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
              </m:ctrlPr>
            </m:e>
            <m:e>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>=&amp;</m:t>
              </m:r>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>4</m:t>
              </m:r>
              <m:r>
                <m:rPr>
                  <m:sty m:val="p" />
                </m:rPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>&amp;&amp;</m:t>
              </m:r>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>5</m:t>
              </m:r>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
              </m:ctrlPr>
            </m:e>
          </m:eqArr>
        </m:oMath>
      </m:oMathPara>

The first (current behavior) is actually closer in appearance to what pdflatex gives us, which is
Screen Shot 2023-03-16 at 11 44 26 PM

No, the second is closer!
Your 2 and 3 are crowded together because there are no spaces added. Please try:

█(1=&2&  &3@=&4&  &5) 

I advocate that ■ corresponds to {matrix} and █ corresponds to {aligned}, because of the meaning of &.
Both ■ in docx and {matrix} in LaTeX, & means a column.
Both █ in docx and {aligned} in LaTeX, odd & means an aligning-point and even & means a padding-point.
Do you find that in your first case, the space between 2 and 3 is too wide?
Because the 2nd & is treated as a new empty column! Not an aligning-point.

I understand that format converting is not always perfect and exact. If the cost of modification is too high, please close this issue.

jgm commented

I'll keep this open. It would not be a small change, because currently we don't have an AST element for aligned environments that is separate from that for matrices -- we use the same form for both. That's not ideal.