michal-h21/tex4ebook

Issue with foo_\text{bar} in equation environment

rasenmaeher92 opened this issue · 6 comments

First of thanks for this great software! It really works like a charm!

I am currently trying to convert some arXiv PDFs (or more precisely speaking their sources) into the epub3 format and have been quite lucky so far. One of the first issues however is with this document and its source code (download link to tar.gz file).

After some testing I boiled one of the issues down to this MWE:

% debug.tex
\documentclass[letterpaper]{article}
\usepackage{amsmath}
\begin{document}
\begin{equation}
    works_{just} = fine   \\
    also \text{works} = great\\
    does_\text{not} = work
\end{equation}
\end{document}

Compiling this with pdflatex works but tex4ebook debug.tex -f epub3 mathml fails.

The error log is (sorry for the German parts at the end, but the interesting parts should be in English)

[STATUS]  tex4ebook: Conversion started
[STATUS]  tex4ebook: Input file: debug.tex
[WARNING] tocid: char-def module not found
[WARNING] tocid: cannot fix section id's
This is pdfTeX, Version 3.141592653-2.6-1.40.23 (MiKTeX 21.8)
entering extended mode
[ERROR]   htlatex: Compilation errors in the htlatex run
[ERROR]   htlatex: Filename     Line    Message
[ERROR]   htlatex: ?    855      Argument of \n:text@: has an extra }.
[ERROR]   htlatex: ?    7        Paragraph ended before \n:text@: was complete.
[ERROR]   htlatex: ?    7        Missing $ inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Display math should end with $$.
[ERROR]   htlatex: ?    7        LaTeX Error: Bad math environment delimiter.
[ERROR]   htlatex: ?    8        You can't use `\eqno' in horizontal mode.
[ERROR]   htlatex: ?    8        Missing $ inserted.
[ERROR]   htlatex: ?    8        Display math should end with $$.
This is pdfTeX, Version 3.141592653-2.6-1.40.23 (MiKTeX 21.8)
entering extended mode
[ERROR]   htlatex: Compilation errors in the htlatex run
[ERROR]   htlatex: Filename     Line    Message
[ERROR]   htlatex: ?    855      Argument of \n:text@: has an extra }.
[ERROR]   htlatex: ?    7        Paragraph ended before \n:text@: was complete.
[ERROR]   htlatex: ?    7        Missing $ inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Display math should end with $$.
[ERROR]   htlatex: ?    7        LaTeX Error: Bad math environment delimiter.
[ERROR]   htlatex: ?    8        You can't use `\eqno' in horizontal mode.
[ERROR]   htlatex: ?    8        Missing $ inserted.
[ERROR]   htlatex: ?    8        Display math should end with $$.
This is pdfTeX, Version 3.141592653-2.6-1.40.23 (MiKTeX 21.8)
entering extended mode
[ERROR]   htlatex: Compilation errors in the htlatex run
[ERROR]   htlatex: Filename     Line    Message
[ERROR]   htlatex: ?    855      Argument of \n:text@: has an extra }.
[ERROR]   htlatex: ?    7        Paragraph ended before \n:text@: was complete.
[ERROR]   htlatex: ?    7        Missing $ inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Missing } inserted.
[ERROR]   htlatex: ?    7        Display math should end with $$.
[ERROR]   htlatex: ?    7        LaTeX Error: Bad math environment delimiter.
[ERROR]   htlatex: ?    8        You can't use `\eqno' in horizontal mode.
[ERROR]   htlatex: ?    8        Missing $ inserted.
[ERROR]   htlatex: ?    8        Display math should end with $$.
[WARNING] domfilter: DOM parsing of debug.xhtml failed:
[WARNING] domfilter: ...am Files/MiKTeX 2.9/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Unbalanced Tag (/math) [char=1026]

        1 Datei(en) kopiert.
        1 Datei(en) kopiert.
        1 Datei(en) kopiert.
        1 Datei(en) kopiert.
Der Befehl "tidy" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.
[WARNING] exec_epub: tidy command seems missing, you should install it in order
  to make valid epub file
        1 Datei(en) kopiert.
[STATUS]  tex4ebook: Conversion finished

I also ran the above command without mathml but the result is about the same.

From testing around with this MWE, the issue seems to be the \text after an underscore _. According to the MathJax documentation \text is allowed, but unfortunately I am not an expert in tex4ebook, TeX4ht or htlatex to really get to the core of this issue (the -a debug flag didn't yield any more information).
I would be very happy about any pointers towards a configuration which may fix this issue or whether I should look into an automatic sanitization of the source code(s) (maybe with TexSoup, as I can modify the source code(s) but didn't write them).

PS: Is this the correct place to ask about such a problem? I have a couple of other questions regarding duplicate captions and missing indices (similar to #33), what would be the best forum to ask them (e.g. is there an active mailing list, forum, discussion board, etc.)?

I have not tested but what happens if you do it like so:

{does}_{\text{this}} = work?

It indeed works! Thank you! Is this something tex4ebook could automatically catch or would I need to take care of it?

I think you would need to take care of it, because that is kind of wrong in LaTeX... you should always atomize (put between {}) the contents of a subscript, specially if you have macros inside, or it's very fragile.

But only the subscripts maybe need it... so, this should also work, and maybe it's less work for you?

does_{\text{this}} = work?

Thanks for your kind words, I am glad that you are using TeX4ebook.
This is known issue. It is best to always add grouping when you work with sub and superscripts. I've tried to execute the Lua code from the previous link on TeX sources of the Arxiv project, and found that it doesn't work well with underscores in filenames. I've updated the TeX files anyway and was able to run make4ht on them.

The compilation ran with only one error, but it was quite important, because it resulted in one huge paragraph and no included images. I've found that the issue was caused by the graphbox package. It seems that it isn't supported by TeX4ht. I can try to make support file in the future, but for now, it is easiest to just not use it with TeX4ht.

The updated TeX files, and configuration file for TeX4ht are attached in the zip file

I could compile it using:

 make4ht -m draft -c config -f html5-common_domfilters iccv-fix "mathml"

I haven't tried it with tex4ebook, because it is quite late already and I wanted to try if it works with normal HTML. I will do more tests tomorrow. There were some issues with MathML post-processing, which is the reason for disabling of the common_domfilters extension.

Regarding further questions, you can post them here, if you prefer Github. Another good place is TeX4ht tag on TeX.sx, mailing list, or issue tracker

But only the subscripts maybe need it... so, this should also work, and maybe it's less work for you?
does_{\text{this}} = work?

Yes that also works!

@michal-h21 thanks for your long and detailed answer, that does indeed answer some questions I had.