Converts LaTeX documents into beautiful responsive web pages.
It turns this sort of thing:
Into this:
The easiest way to run Engrafo is by using the Docker image. To convert input/main.tex
into output/index.html
, run:
$ docker run \
-v "$(pwd)":/workdir -w /workdir \
arxivvanity/engrafo engrafo -o output/ input/main.tex
For full usage, run docker run arxivvanity/engrafo engrafo --help
.
We couldn't find a good LaTeX to HTML converter. But instead of building one from scratch, we picked some components that solved part of the problem and modified them to our needs.
The downside of this approach is a fair amount of shoe-horning, but the upside is (probably) less work, and it means we can contribute our improvements to each component back to the open-source academic community.
Here is how it works:
- Pandoc does most of the heavy lifting, using our own fork. It parses the LaTeX and outputs the basic HTML.
- During the Pandoc conversion, a Pandoc filter (in
pandocfilter/
) convertstikzpicture
to SVG, inserts labels, inserts hyperlinks, and various other things. - After the Pandoc conversion, we apply Distill's template to style the output, make it responsive, create footnotes, and create hover boxes.
- Some post-processors (in
src/postprocessors/
) do layout, additional styling, math rendering, bibliography rendering, and various other things. Pandoc can only output a particular subset of HTML from its AST, so the post-processors also rename and move around some elements.
The line between the Pandoc filter and the post-processing is pretty fuzzy at the moment. If you're trying to find some logic as to where a bit of processing lives, there probably isn't any. The intention is that we do as much as possible in Pandoc, then use the post-processor to rejig anything that Pandoc can't easily do.
In development, you can build an image locally and use a shortcut script to run the image:
$ script/build
$ script/engrafo -o output/ input/main.tex
You can also run a server that allows you to view papers from Arxiv in a browser. Start it by running:
$ script/server
And it will be available at http://localhost:8010/.
Engrafo uses a custom version of Pandoc. If you are working on Pandoc locally, you can continuously build the pandoc
binary and inject it into the Engrafo image.
In your local Pandoc directory, run:
$ ./docker-watch-build.sh
In another shell, in the Engrafo directory, run:
$ PANDOC_DIR=/path/to/local/pandoc/dir script/server
Now, whenever you make a change to a Pandoc source file, the binary will build and will be visible in the Engrafo container.
Run the main test suite:
$ script/test
You can run entire suites:
$ script/test integration-tests/images.test.js
Or individual tests by matching a string:
$ script/test -t "titles and headings"
There is also a test suite for the Pandoc filter:
$ script/test-pandocfilter
The integration tests in integration-tests/
render small LaTeX files and ensure they produce a particular HTML output.
The integration tests use Jest's snapshotting feature.
Each test renders a LaTeX file and ensures it matches a snapshot. If it does not match, Jest prints a pretty diff and gives you the option to automatically fix the test.
First, write a test case describing in plain text what you are testing. For example, in integration-tests/formatting.test.js
:
test("bold text renders correctly", done => {
utils.expectBodyToMatchSnapshot("formatting/bold.tex", done);
});
Then, write integration-tests/formatting/bold.tex
:
\begin{document}
I am \textbf{bold}!
\end{document}
Now, run the test passing the -u
option to write out a snapshot of what is rendered:
$ script/test -t "bold text renders correctly" -u
Check the output looks correct in integration-tests/__snapshots__/formatting.test.js.snap
. You can re-run that command without the -u
option to ensure the test passes.
The test will fail if the output changes in the future. If the change is expected, then you can simply re-run the test with -u
to overwrite the snapshot and fix the test.