Escape sequences are handled incorrectly in LaTeX
eric-wieser opened this issue · 9 comments
The docstring here, which is
/-- Given modules `M`, `M₂` over a commutative ring, together with submodules `p ⊆ M`, `q ⊆ M₂`,
the set of maps $\{f ∈ Hom(M, M₂) | f(p) ⊆ q \}$ is a submodule of `Hom(M, M₂)`. -/
renders as
Given modules
M
,M₂
over a commutative ring, together with submodulesp ⊆ M
,q ⊆ M₂
,
the set of maps$f ∈ Hom(M, M₂) | f(p) ⊆ q$ is a submodule ofHom(M, M₂)
.
instead of the expected (note the {}s):
Given modules
M
,M₂
over a commutative ring, together with submodulesp ⊆ M
,q ⊆ M₂
,
the set of maps$\{f ∈ Hom(M, M₂) | f(p) ⊆ q \}$ is a submodule ofHom(M, M₂)
.
Interestingly github seems to agree with this interpretation:
Given modules
M
,M₂
over a commutative ring, together with submodulesp ⊆ M
,q ⊆ M₂
,
the set of maps${f ∈ Hom(M, M₂) | f(p) ⊆ q }$ is a submodule ofHom(M, M₂)
.
I think this is caused by the markdown parser not really supporting $$
notation; it should not be processing markdown escape sequences inside embedded latex code, just as it already does not process markdown escape sequences in code
blocks.
(I think this has been reported on Zulip before, but I couldn't find a tracking issue)
just as it already does not process markdown escape sequences in
code
blocks.
This is a natural feature of markdown processor I think. But generally markdown processor (in our case it is a C library cmark
) does not understand LaTeX. So I think before feeding the docstring to markdown processor, we need to filter all LaTeX formulas.
EDIT: I have tested that cmark
indeed will convert $\{ \}$
to ${ }$
.
PS: M₂
is not a valid LaTeX formula; it should be M_2
.
I think I can try to look at this issue.
PS:
M₂
is not a valid LaTeX formula; it should beM_2
.
Mathjax is quite happy to render
So I think before feeding the docstring to markdown processor, we need to filter all LaTeX formulas
This is not a sensible approach; the markdown parser needs to natively know where a formula starts and end; if you do what you suggest, then things like This is a `$` with not latex `$`
will render incorrectly. If you teach your filtering about code syntax, then now you have a hacky parser wrapped around a real one, and will almost certainly create more bugs.
If you teach your filtering about code syntax, then now you have a hacky parser wrapped around a real one, and will almost certainly create more bugs.
Unfortunately this is inevitable if we want a quick fix. The proper fix perhaps is waiting for https://github.com/leanprover/verso
So I think before feeding the docstring to markdown processor, we need to filter all LaTeX formulas
This is not a sensible approach; the markdown parser needs to natively know where a formula starts and end; if you do what you suggest, then things like
This is a `$` with not latex `$`
will render incorrectly. If you teach your filtering about code syntax, then now you have a hacky parser wrapped around a real one, and will almost certainly create more bugs.
If I understand the suggestion correctly, this is actually what we did in doc-gen3 to get LaTeX working there, but it was indeed a bit of a hack: leanprover-community/doc-gen#110
(Basically, we ported the code that math.stackexchange and mathoverflow use for filtering LaTeX formulas out of markdown, hoping that (1) if there were any bugs in that parser it would have been found by someone using those sites already (2) we wouldn't introduce any new bugs porting this incredibly delicate JS code to Python.)
edit: I think everything I wrote in this Zulip post still stands (including my offer to help if someone wants to take this approach)
I am investigating with other solutions. So far I found another markdown parser written in C https://github.com/mity/md4c which claims supporting LaTeX formulas, tables, etc. (Currently we are using https://github.com/commonmark/cmark.) Hopefully this will be fixed after switching markdown processor.
Yes, if switching the processor works that would be easiest! By the way, if you need some examples to test LaTeX embedded in markdown, I put together this markdown file (see also the result rendered by doc-gen3).
edit: Interestingly, our test file reveals various issues(?) in GitHub's handling of LaTeX in markdown.