Arxiv Vanity renders papers from Arxiv as responsive web pages so you don't have to squint at a PDF.
It turns this sort of thing:
Into this:
This is the web interface for viewing papers. The actual LaTeX to HTML conversion (the interesting bit) is done by Engrafo.
Arxiv Vanity downloads LaTeX source from Arxiv and renders it as HTML using the Engrafo LaTeX to HTML convertor.
The web app runs render jobs on Hyper.sh as Docker containers, and they report their status directly back to the web app with a webhook. This approach has two neat properties:
- It effectively scales infinitely
- There is no worker process or message queue
The process looks a bit like this:
In detail:
- Details about the paper are fetched from the Arxiv API. Metadata is stored in a Postgres database using Django's ORM, and the paper's LaTeX source is stored on S3.
- Engrafo is run on Hyper.sh to convert the LaTeX source to HTML. It fetches the source and stores the result on S3. The container ID is stored in the Postgres database so the status of the rendering job can be queried.
- When the rendering job is finished, the Hyper.sh container makes an HTTP request to the web app to mark it as rendered.
Install Docker for Mac or Windows.
Do the initial database migration and set up a user:
$ script/manage migrate
$ script/manage createsuperuser
Then to run the app:
$ docker-compose up --build
Your app is now available at http://localhost:8000. The admin interface is at http://localhost:8000/admin/.
You can scrape the latest papers from Arxiv by running:
$ script/manage scrape_papers
It'll probably fetch quite a lot, so hit ctrl-C
when you've got enough.
$ script/test
Thanks to our generous sponsors for supporting the development of Arxiv Vanity! Sponsor us to get your logo here.