alvinwan/TexSoup

Issue parsing nested equation / scalebox / cases + need for more detailed error messages

tvercaut opened this issue · 1 comments

I am trying to extract the title, author and abstract of a number of latex files. TexSoup has proven very useful for this purpose already.

While doing so, I however stumbled on an issue to parse a complex nexted expression involving an equation, a scalebox and a cases environment. Below is a small test case to reproduce:

#!/usr/bin/env python3

from TexSoup import TexSoup

tex_doc = r"""
\documentclass{article}
\usepackage{graphicx}
\begin{document}
\begin{equation}
\scalebox{2.0}{$x = 
\begin{cases}
1, & \text{if } y=1 \\
0, & \text{otherwise}
\end{cases}$}
\end{equation}
\end{document}
"""

soup = TexSoup(tex_doc)
print(list(soup))

It took me a while to find out the offensice code as teh error message only said:

EOFError: Expecting $. Reached end of file.

For my use case, the following things would have been very useful:

  • allow TexSoup to ignore parsing errors and continue (as I would assume the title, abstract and authors should already have been parsed correctly when this error was encountered)
  • provide a more detailled error message including for example the location of the start of the offensive expression

Thanks for the feedback! Adding a fault-tolerant flag to merge soon. Have also amended several of the most common parse errors to be more informative (including line no and offset)