Parses valid LaTeX and provides a variety of BeautifulSoup-esque methods and Pythonic idioms for iterating and searching the parse tree. Unlike BeautifulSoup however, TexSoup is modeled after an interpreter, providing a set of Pythonic structures for processing environments, commands, and arguments.
created by Alvin Wan
Just install via pip.
pip install texsoup
There is one main utility, TexSoup
, which translates any LaTeX string or
iterator into a soupified object.
You have two options. Either give (1) a file buffer (open('file.tex')
) or (2) a string.
from TexSoup import TexSoup
soup = TexSoup("""
\begin{document}
\section{Hello \textit{world}.}
\subsection{Watermelon}
(n.) A sacred fruit. Also known as:
\begin{itemize}
\item red lemon
\item life
\end{itemize}
Here is the prevalence of each synonym.
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
\end{document}
""")
With the soupified LaTeX, you can now search and traverse the document tree. The below is a demonstration of basic functions that TexSoup provides.
>>> soup.section
\section{Hello \textit{world}.}
>>> soup.section.name
'section'
>>> soup.section.string
'Hello \\textit{world}.'
>>> soup.section.parent.name
'document'
>>> soup.tabular
\begin{tabular}{c c}
red lemon & uncommon \\
life & common
\end{tabular}
>>> soup.tabular.args[0]
'c c'
>>> soup.item
\item red lemon
>>> list(soup.find_all('item'))
[\item red lemon, \item life]
For (slightly) more advanced searches, include arguments. For example, to
search for all references to a particular label, search for ref{<label>}
. This
way you can count the number of times a particular label is referenced.
>>> soup = TexSoup("""
... \section{Heading}\label{Section:Heading}
...
... Some text about the \ref{Section:Heading} heading goes here. Yet another
... sentence about the \ref{Section:Heading} heading.
... """)
>>> soup.count('\ref{Section:Heading}')
2
Additionally, modify the TeX parse tree in place, to generate new LaTeX.
>>> soup = TexSoup("""\textbf{'Hello'}\textit{'Y'}O\textit{'U'}""")
>>> soup.textbf.delete()
>>> 'Hello' not in repr(soup)
True
>>> soup.textit.replace('S')
>>> soup.textit.replace('U', 'P')
>>> str(soup)
SOUP
There is one main utility, read
, which translates any LaTeX string or iterator
into a Python abstraction.
>>> from TexSoup import read
>>> expr = read('\section{textbf}')
>>> expr
TexCmd('section', [RArg('textbf')])
>>> print(expr)
\section{textbf}
TexSoup has a variety of practical applications, whether it be minor conveniences or more powerful LaTeX extensions. The below exhibits a few of these use cases, from simple reference counts to integration with computer algebra systems (coming soon).
See the examples/
folder for example scripts and usages for TexSoup.
See slightly more complex uses for TexSoup.
- LaTex2Python converts LaTeX into a document tree, organizing content by either a default or custom hierarchy.