Be able to fully parse docutils' mathematics.txt
infinity0 opened this issue · 7 comments
On a discussion on different latex-mathml converters I found out about the following test file: https://docutils.sourceforge.io/docs/ref/rst/mathematics.txt
Pandoc can parse a lot of it, but gives the following errors:
$ pandoc --mathml -f rst mathematics.txt 2>&1 >/dev/null | grep "unexpected" | sort -u
unexpected "\\"
unexpected control sequence \arrowvert
unexpected control sequence \Arrowvert
unexpected control sequence \Bigl
unexpected control sequence \bracevert
unexpected control sequence \cfrac
unexpected control sequence \circledS
unexpected control sequence \diagdown
unexpected control sequence \diagup
unexpected control sequence \gggtr
unexpected control sequence \idotsint
unexpected control sequence \injlim
unexpected control sequence \intop
unexpected control sequence \llless
unexpected control sequence \mspace
unexpected control sequence \negmedspace
unexpected control sequence \negthickspace
unexpected control sequence \ngeqq
unexpected control sequence \nleqq
unexpected control sequence \nshortmid
unexpected control sequence \nshortparallel
unexpected control sequence \nsubseteqq
unexpected control sequence \nsupseteqq
unexpected control sequence \ointop
unexpected control sequence \projlim
unexpected control sequence \shortmid
unexpected control sequence \shortparallel
unexpected control sequence \smallint
unexpected control sequence \surd
unexpected control sequence \thickapprox
unexpected control sequence \thicksim
unexpected control sequence \underleftrightarrow
unexpected control sequence \varinjlim
unexpected control sequence \varliminf
unexpected control sequence \varlimsup
unexpected control sequence \varprojlim
unexpected "x"
Many of these should just be a case of updating unimathsymbols.txt; some other things are a bit more complex such as spaces \
.
What does your pandoc --version
say? Are you using the latest version? I just tried it and got the following, which seems different -- for example, \mspace
is handled fine.
[WARNING] Could not convert TeX math \underleftrightarrow{gbi}, rendering as TeX:
underleftrightarrow{gbi}
^
unexpected control sequence \underleftrightarrow
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \arrowvert, rendering as TeX:
\arrowvert
^
unexpected control sequence \arrowvert
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \Arrowvert, rendering as TeX:
\Arrowvert
^
unexpected control sequence \Arrowvert
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \bracevert, rendering as TeX:
\bracevert
^
unexpected control sequence \bracevert
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \projlim, rendering as TeX:
\projlim
^
unexpected control sequence \projlim
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \injlim, rendering as TeX:
\injlim
^
unexpected control sequence \injlim
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varlimsup, rendering as TeX:
\varlimsup
^
unexpected control sequence \varlimsup
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varliminf, rendering as TeX:
\varliminf
^
unexpected control sequence \varliminf
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varprojlim, rendering as TeX:
\varprojlim
^
unexpected control sequence \varprojlim
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varinjlim, rendering as TeX:
\varinjlim
^
unexpected control sequence \varinjlim
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \circledS, rendering as TeX:
\circledS
^
unexpected control sequence \circledS
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \surd, rendering as TeX:
\surd
^
unexpected control sequence \surd
expecting "%", "\\label", "\\tag", "\\nonumber", whitespace, "[", "!", "'", "''", "'''", "''''", "*", "+", ",", "-", ".", "/", ":", ":=", ";", "<", "=", ">", "?", "@", "~", "\\" or "{"
[WARNING] Could not convert TeX math \diagdown, rendering as TeX:
\diagdown
^
unexpected control sequence \diagdown
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \diagup, rendering as TeX:
\diagup
^
unexpected control sequence \diagup
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \ngeqq, rendering as TeX:
\ngeqq
^
unexpected control sequence \ngeqq
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nleqq, rendering as TeX:
\nleqq
^
unexpected control sequence \nleqq
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \thickapprox, rendering as TeX:
\thickapprox
^
unexpected control sequence \thickapprox
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \thicksim, rendering as TeX:
\thicksim
^
unexpected control sequence \thicksim
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \llless, rendering as TeX:
\llless
^
unexpected control sequence \llless
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \gggtr, rendering as TeX:
\gggtr
^
unexpected control sequence \gggtr
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \shortmid, rendering as TeX:
\shortmid
^
unexpected control sequence \shortmid
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \shortparallel, rendering as TeX:
\shortparallel
^
unexpected control sequence \shortparallel
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nshortmid, rendering as TeX:
\nshortmid
^
unexpected control sequence \nshortmid
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nshortparallel, rendering as TeX:
\nshortparallel
^
unexpected control sequence \nshortparallel
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nsubseteqq, rendering as TeX:
\nsubseteqq
^
unexpected control sequence \nsubseteqq
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nsupseteqq, rendering as TeX:
\nsupseteqq
^
unexpected control sequence \nsupseteqq
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \smallint, rendering as TeX:
\smallint
^
unexpected control sequence \smallint
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\negmedspace 4, rendering as TeX:
3\negmedspace 4
^
unexpected control sequence \negmedspace
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\negthickspace 4, rendering as TeX:
3\negthickspace 4
^
unexpected control sequence \negthickspace
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\hspace{1ex}4, rendering as TeX:
3\hspace{1ex}4
^
unexpected "x"
expecting "em"
[WARNING] Could not convert TeX math \frac{\pi}{4} = 1 + \cfrac{1^2}{
2 + \cfrac{3^2}{
2 + \cfrac{5^2}{
2 + \cfrac{7^2}{2 + \cdots}
}}}
\qquad \text{vs.}\qquad
\frac{\pi}{4} = 1 + \frac{1^2}{
2 + \frac{3^2}{
2 + \frac{5^2}{
2 + \frac{7^2}{2 + \cdots}
}}}, rendering as TeX:
pi}{4} = 1 + \cfrac{1^2}{
^
unexpected control sequence \cfrac
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \cfrac[l]{x}{x-1} \quad
\cfrac{x}{x-1} \quad
\cfrac[r]{x}{x-1}, rendering as TeX:
\cfrac[l]{x}{x-1} \quad
^
unexpected control sequence \cfrac
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \displaystyle
\Bigl(b\Bigr)
\Bigl(\frac{c}
{d}\Bigr), rendering as TeX:
\Bigl(b\Bigr)
^
unexpected control sequence \Bigl
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left[\sum_i a_i\left\lvert\sum_j x_{ij}\right\rvert^p\right]^{1/p}
\text{ versus }
\biggl[\sum_i a_i\Bigl\lvert\sum_j x_{ij}\Bigr\rvert^p\biggr]^{1/p}, rendering as TeX:
ggl[\sum_i a_i\Bigl\lvert\sum_j x_{ij}\B
^
unexpected control sequence \Bigl
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \Bigl(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\Bigr), rendering as TeX:
\Bigl(\begin{smallmatrix} a & b \\ c & d
^
unexpected control sequence \Bigl
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \intop_0^1, rendering as TeX:
\intop_0^1
^
unexpected control sequence \intop
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \ointop_c, rendering as TeX:
\ointop_c
^
unexpected control sequence \ointop
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \intop_0^1 \quad \ointop_c
\quad \text{vs.} \quad
\int^1_0 \quad \oint_c, rendering as TeX:
\intop_0^1 \quad \ointop_c
^
unexpected control sequence \intop
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \begin{aligned}
\left( 3 \right)
\left( f(x) \right)
\left( \bar x \right)
\left( \overline x \right)
\left( n_i \right) &= () \\
\left( \underline x \right) &= \bigl(\text{big}\bigr)\\
\left( 3^2 \right)
\left( \sqrt{3} \right)
\left( \sqrt{3^2} \right)
\left( \sum \right)
\left( \bigotimes \right)
\left( \prod \right) &= \Bigl(\text{Big}\Bigr)\\
\left( \frac{3 }{2} \right)
\left( \frac{3^2}{2^4} \right)
\binom{3 }{2}
\begin{pmatrix} a & b \\ c & d \end{pmatrix}
\left( \frac{1}{\sqrt 2} \right)
\left( \int \right)
\left( \int_0 \right)
\left( \int^1 \right)
\left( \int_0^1 \right) &= \biggl(\text{bigg}\biggr)\\
\left( \frac{\sqrt 2}{2} \right)
\left( \sum_0 \right)
\left( \sum^1 \right)
\left( \sum_0^1 \right)
\left( \frac{\frac1x}{\frac{1}{n}}\right) &= \Biggl(\text{Bigg}\Biggr)\\
\left( \intop_0 \right)
\left( \intop^1 \right)
\left( \intop_0^1 \right)
\end{aligned}, rendering as TeX:
\right) &= \Bigl(\text{Big}\Bigr
^
unexpected "\\"
expecting "&", "\\\\", white space or "\\end"
[WARNING] Could not convert TeX math \Bigl(\text{Big}\Bigr), rendering as TeX:
\Bigl(\text{Big}\Bigr)
^
unexpected control sequence \Bigl
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left.\lgroup b\right\rgroup\ \bigl\lgroup b\Bigr\rgroup\ \biggl\lgroup b\Biggr\rgroup
\quad
\left.\lmoustache b\right\rmoustache\ \bigl\lmoustache b\Bigr\rmoustache\ \biggl\lmoustache b\Biggr\rmoustache
\quad
\left./b\right\backslash\ \bigl/b\Bigr\backslash\ \biggl/b\Biggr\backslash, rendering as TeX:
roup b\right\rgroup\ \bigl\lgroup b\Bigr
^
unexpected "\\"
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left.|b\right\|\ \bigl|b\Bigr\|\ \biggl|b\Biggr\|
\quad
\left.\vert b\right\Vert\ \bigl\vert b\Bigr\Vert\ \biggl\vert b\Biggr\Vert
\quad
\left.\arrowvert b\right\Arrowvert\ \bigl\arrowvert b\Bigr\Arrowvert\ \biggl\arrowvert b\Biggr\Arrowvert
\quad
\left.\bracevert b\right\bracevert\ \bigl\bracevert b\Bigr\bracevert\ \biggl\bracevert b\Biggr\bracevert
\quad
\left.\vert b\right\Vert\ \bigl\vert b\Bigr\Vert\ \biggl\vert b\Biggr\Vert, rendering as TeX:
\left.\arrowvert b\right\Arrowvert\ \big
^
unexpected control sequence \arrowvert
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int\ \iint\ \iiint\ \iiiint\ \idotsint\ \oint\ \smallint\
\sum\ \prod\ \coprod\ \bigwedge\ \bigvee\ \bigcap\ \bigcup\
\biguplus\ \bigsqcup\ \bigodot\ \bigoplus\ \bigotimes, rendering as TeX:
\iiiint\ \idotsint\ \oint\ \smallint\
^
unexpected control sequence \idotsint
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int_1 f\ \intop_1 f\ \iint_1 f\ \smallint_1 f\ \sum_1\
\prod_1\ \bigwedge_1\ \bigcap_1\ \biguplus_1\ \bigodot_1\ \int^N\
\intop^N\ \iiiint^N\ \oint^N\ \smallint^N\ \sum^N\ \coprod^N\
\bigvee^N\ \bigcup^N\ \bigsqcup^N\ \bigotimes^N, rendering as TeX:
\int_1 f\ \intop_1 f\ \iint_1 f\ \smalli
^
unexpected control sequence \intop
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int_1^N\ \intop_1^N\ \iint_1^N\ \iiint_1^N\ \iiiint_1^N\
\idotsint_1^N\ \oint_1^N\ \smallint_1^N\ \sum_1^N\ \prod_1^N\
\coprod_1^N\ \bigwedge_1^N\ \bigvee_1^N\ \bigcap_1^N\ \bigcup_1^N
\ \biguplus_1^N\ \bigsqcup_1^N\ \bigodot_1^N\ \bigoplus_1^N\
\bigotimes_1^N, rendering as TeX:
\int_1^N\ \intop_1^N\ \iint_1^N\ \iiint_
^
unexpected control sequence \intop
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \text{\c{c} \'e \`e \"e \^e \~n \r{u} \v{z} \textcircled{c}}, rendering as TeX:
'e \`e \"e \^e \~n \r{u} \v{z} \textcirc
^
unexpected "\\"
expecting text, "}", "{", "$", "$$", "\\(" or "\\["
Here's my pared down list:
\Arrowvert
\Bigl
\arrowvert
\bracevert
\cfrac
\circledS
\diagdown
\diagup
\gggtr
\idotsint
\injlim
\intop
\llless
\negmedspace
\negthickspace
\ngeqq
\nleqq
\nshortmid
\nshortparallel
\nsubseteqq
\nsupseteqq
\ointop
\projlim
\shortmid
\shortparallel
\smallint
\surd
\thickapprox
\thicksim
\underleftrightarrow
\varinjlim
\varliminf
\varlimsup
\varprojlim
Some of these are supported (e.g. \surd
), so we need to look at the details. Others aren't in the symbol list at all.
I'm using the Debian pandoc which is a little bit behind this repo:
$ pandoc --version
pandoc 2.17.1.1
Compiled with pandoc-types 1.22.2, texmath 0.12.4, skylighting 0.12.3.1,
citeproc 0.6.0.1, ipynb 0.2
[..]
Clarification: texmath can handle \surd{3}{4}
but not plain \surd
.
Forwarding some extra information from Günter Milde, the docutils developer who also originally created unimathsymbols.txt:
The database and related work is available under https://milde.users.sourceforge.net/LUCR/Math/ The latest revision is used in latex2mathml but not published yet.
The "unimathsymbols" database only contains LaTeX math macros that map directly to Unicode code points. (\underleftrightarrow is implemented using ↔ (\leftrightarrow) in a
<munder>
element.)
We in fact generate our list of unicode - TeX mappings from Milde's 2011 database.
If there's a new revision out, we could use that, but I couldn't find anything more recent than the 2011 one...
c.f. #241; this issue might subsume that one?