Tex2py converts LaTeX into a Python parse tree, using TexSoup. This allows you to navigate latex files as trees, using either the default or a custom hierarchy. See md2py for a markdown parse tree.
created by Alvin Wan
Install via pip.
pip install tex2py
LaTeX2Python offers only one function tex2py
, which generates a Python
parse tree from Latex. This object is a navigable, "Tree of Contents"
abstraction for the latex file.
Take, for example, the following latex file. (See pdf)
chikin.tex
\documentclass[a4paper]{article}
\begin{document}
\section{Chikin Tales}
\subsection{Chikin Fly}
Chickens don't fly. They do only the following:
\begin{itemize}
\item waddle
\item plop
\end{itemize}
\section{Chikin Scream}
\subsection{Plopping}
Plopping involves three steps:
\begin{enumerate}
\item squawk
\item plop
\item repeat, unless ordered to squat
\end{enumerate}
\subsection{I Scream}
\end{document}
Akin to a navigation bar, the TreeOfContents
object allows you to expand a
latex file one level at a time. Running tex2py
on the above latex file
will generate a tree, abstracting the below structure.
<Document>
/ \
Chikin Tales Chikin Scream
/ / \
Chikin Fly Plopping I Scream
At the global level, we can access the title.
>>> from tex2py import tex2py
>>> with open('chikin.tex') as f: data = f.read()
>>> toc = tex2py(data)
>>> toc.section
Chikin Tales
>>> str(toc.section)
'Chikin Tales'
Notice that at this level, there are no subsection
s.
>>> list(toc.subsections)
[]
The main section
has two subsection
s beneath it. We can access both.
>>> list(toc.section.subsections)
[Chikin Fly, Chikin Scream]
>>> toc.section.subsection
Chikin Fly
The TreeOfContents
class also has a few more conveniences defined. Among them
is support for indexing. To access the i
th child of an <element>
- instead of <element>.branches[i]
- use <element>[i]
.
See below for example usage.
>>> toc.section.branches[0] == toc.section[0] == toc.section.subsection
True
>>> list(toc.section.subsections)[1] == toc.section[1]
True
>>> toc.section[1]
Chikin Scream
You can now print the document tree. (There is some weirdness with branches beyond titles, so for only titles, we have the following:
┌Chikin Tales┐
│ └Chikin Fly
[document]┤
│ ┌Plopping
└Chikin Scream┤
│
│
└I Scream