please use █ instead of ■ when converting {aligned} into docx
ZhuangQu opened this issue · 6 comments
I use pandoc 3.1.1 in Windows11. When converting
\begin{equation*}
\begin{aligned}
1= & 2 & & 3 \\
= & 4 & & 5 \\
\end{aligned}
\end{equation*}
from LaTeX into docx, we get
■(1=&2&&3@=&4&&5)
in Word. we can see that you convert {aligned}
to ■, which is wrong. The correct output is █.
In UnicodeMath, ■ U+25A0
represents a matrix, █ U+2588
represents an aligned structure.
Transferring to jgm/texmath which does our math conversion.
Note: we don't use UnicodeMath; we use Word's XML representation of math.
The above aligned environment is translated as
<m:oMathPara>
<m:oMathParaPr>
<m:jc m:val="center" />
</m:oMathParaPr>
<m:oMath>
<m:m>
<m:mPr>
<m:baseJc m:val="center" />
<m:plcHide m:val="1" />
<m:mcs>
<m:mc>
<m:mcPr>
<m:mcJc m:val="right" />
<m:count m:val="1" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:mcJc m:val="left" />
<m:count m:val="1" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:mcJc m:val="right" />
<m:count m:val="1" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:mcJc m:val="left" />
<m:count m:val="1" />
</m:mcPr>
</m:mc>
</m:mcs>
</m:mPr>
<m:mr>
<m:e>
<m:r>
<m:t>1</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<m:t>=</m:t>
</m:r>
</m:e>
<m:e>
<m:r>
<m:t>2</m:t>
</m:r>
</m:e>
<m:e />
<m:e>
<m:r>
<m:t>3</m:t>
</m:r>
</m:e>
</m:mr>
<m:mr>
<m:e>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<m:t>=</m:t>
</m:r>
</m:e>
<m:e>
<m:r>
<m:t>4</m:t>
</m:r>
</m:e>
<m:e />
<m:e>
<m:r>
<m:t>5</m:t>
</m:r>
</m:e>
</m:mr>
</m:m>
</m:oMath>
</m:oMathPara>
Please suggest more appropriate OMML.
Sorry, I don't know what is OMML.
I only know that █ is correct and ■ is wrong.
Maybe you can convert UnicodeMath to OMML to get more appropriate OMML.
Experimenting with Word: using U+25A0, I get
and XML
<m:oMathPara>
<m:oMathParaPr>
<m:jc m:val="center" />
</m:oMathParaPr>
<m:oMath>
<m:m>
<m:mPr>
<m:plcHide m:val="1" />
<m:mcs>
<m:mc>
<m:mcPr>
<m:count m:val="1" />
<m:mcJc m:val="right" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:count m:val="1" />
<m:mcJc m:val="left" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:count m:val="1" />
<m:mcJc m:val="right" />
</m:mcPr>
</m:mc>
<m:mc>
<m:mcPr>
<m:count m:val="1" />
<m:mcJc m:val="left" />
</m:mcPr>
</m:mc>
</m:mcs>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
</m:ctrlPr>
</m:mPr>
<m:mr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>1</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>=</m:t>
</m:r>
</m:e>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>2</m:t>
</m:r>
</m:e>
<m:e />
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>3</m:t>
</m:r>
</m:e>
</m:mr>
<m:mr>
<m:e>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>=</m:t>
</m:r>
</m:e>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>4</m:t>
</m:r>
</m:e>
<m:e />
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>5</m:t>
</m:r>
</m:e>
</m:mr>
</m:m>
</m:oMath>
</m:oMathPara>
and XML
<m:oMathPara>
<m:oMath>
<m:eqArr>
<m:eqArrPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" w:cs="Arial" />
<w:color w:val="24292F" />
<w:sz w:val="21" />
<w:szCs w:val="21" />
<w:shd w:val="clear" w:color="auto"
w:fill="FFFFFF" />
</w:rPr>
</m:ctrlPr>
</m:eqArrPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>1</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>=</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>&</m:t>
</m:r>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>2</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>&</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>&</m:t>
</m:r>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>3</m:t>
</m:r>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
</m:ctrlPr>
</m:e>
<m:e>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>=&</m:t>
</m:r>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>4</m:t>
</m:r>
<m:r>
<m:rPr>
<m:sty m:val="p" />
</m:rPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>&&</m:t>
</m:r>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
<m:t>5</m:t>
</m:r>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math"
w:hAnsi="Cambria Math" />
</w:rPr>
</m:ctrlPr>
</m:e>
</m:eqArr>
</m:oMath>
</m:oMathPara>
The first (current behavior) is actually closer in appearance to what pdflatex gives us, which is
No, the second is closer!
Your 2 and 3 are crowded together because there are no spaces added. Please try:
█(1=&2& &3@=&4& &5)
I advocate that ■ corresponds to {matrix}
and █ corresponds to {aligned}
, because of the meaning of &
.
Both ■ in docx and {matrix}
in LaTeX, &
means a column.
Both █ in docx and {aligned}
in LaTeX, odd &
means an aligning-point and even &
means a padding-point.
Do you find that in your first case, the space between 2 and 3 is too wide?
Because the 2nd &
is treated as a new empty column! Not an aligning-point.
I understand that format converting is not always perfect and exact. If the cost of modification is too high, please close this issue.
I'll keep this open. It would not be a small change, because currently we don't have an AST element for aligned environments that is separate from that for matrices -- we use the same form for both. That's not ideal.