Doclang Benchmark

Setup

Prerequisites: npm, pdflatex, and pandoc.

cd scripts
pip3 install -e .
python3 bench.py install

Running the benchmark

python3 bench.py all

See the output/ directory for outputs.

Tasks

  • basic-formatting: universal document primitives. Paragraphs, inline styles (incl nesting), sections. Tests baseline conciseness.
  • list-map: mapping over a list, specifically generating bullets from a string array. Tests compile-time computation with data structures.
  • reactive-slider: updating the page when user moves a slider. Tests model-view separation and reactivity w/ run-time computation.

Brainstorm for other tasks:

  • Nested documents: custom components which contain nontrivial document fragments
  • Complex data structures: pass around a fancy object that can't be stringifed as an HTML attribute
  • Macros: custom functions that are sprinkled around the text. Or maybe just definitions/references?
  • Errors: what kinds of undefined behavior is permitted by the complex runtime-enabled frameworks?
  • Abstraction power: Adding a new kind of definition to your lang, eg Shriram's literate Racket example. Or a glossary. Or a concordance??

General thoughts:

  • Goal is to measure the languages more than their component libraries. How can we distinguish between those two?
  • Are character length / token length going to be useful measures? Might be interesting to count the number of unique language mechanisms, eg for langs that have a gajillion special cases.
  • Use Cognitive Dimension and Technical Dimensions

Languages

Other candidates for inclusion:

  • Markdoc: seems to only support compile-time computation.
  • Pandoc: used by Living Papers and Quarto, so probably just pick one of those instead.
  • Quarto: very similar to Living Papers, unclear if we need another comparison.
  • Pollen: relatively niche, mostly a contrast to Scribble.
  • reStructuredText: more customizable than Markdown but seems to be static-only.
  • Typst
  • AsciiDoc