/djot.js

JavaScript implementation of djot

Primary LanguageTypeScriptMIT LicenseMIT

djot.js

A library and command-line tool for parsing and rendering the light markup format djot.

This is a typescript rewrite of djot's original lua implementation. It is currently powering the djot playground.

Script API

These are available after you run npm install, which will install the library's build dependencies:

npm run ... Description
build Compile and bundle sources to lib and dist
test Run tests
bench Run benchmarks

Installing the command line utility

You can install the command-line utility djot via

npm install -g @djot/djot

djot --help will give a summary of options. For more extensive documentation, use man djot or see the man page online.

You can use djot to

  • convert from djot to HTML
  • show the low-level event stream generated by the djot parser
  • show the AST produced by the djot parser, in JSON or compact form
  • convert from a JSON-formatted djot AST to djot or HTML
  • alter the AST using filters
  • convert from djot to pandoc JSON which can be read by pandoc
  • convert pandoc JSON (or djot) to djot

For example, to convert a gfm document mydoc.md to djot,

pandoc mydoc.md -f gfm -t json | djot -f pandoc -t djot > mydoc.dj

And to convert back to gfm,

djot mydoc.dj -t pandoc | pandoc -f json -t gfm

CDN

The library is available via the unpkg CDN:

<script src="https://unpkg.com/@djot/djot@0.2.5/dist/djot.js"></script>

(Replace 0.2.5 with the version you want to use.)

Library API

Parsing djot to an AST

parse(input : string, options : ParseOptions = {})

Example of usage:

djot.parse("hello _there_", {sourcePositions: true});

options can have the following (optional) fields:

  • sourcePositions : boolean: include source positions in the AST
  • warn : (message : Warning) => void: function used to handle warnings from the parser.

A warning can be rendered using the render() method, so to render warnings to the console, for example, you could do:

{ warn: (warning) => console.log(warning.render()) }

Alternatively, you can directly access the fields message (string), offset (number if defined), and pos ({ line: number, col: number, offset: number } if defined).

Parsing djot to a stream of events

parseEvents(input : string, options : Options = {})

Exposes an iterator over events, each with the form {startpos : number, endpos : number, annot : string}. Example of usage:

for (let event of parseEvents("Hi _there_")) {
  console.log(event)
}

The Options object has a warn property as described for parse, above.

Pretty-printing the djot AST

renderAST(doc : Doc) : string

Example of usage:

console.log(djot.renderAST(djot.parse("hi _there_")));

Rendering the djot AST to HTML

renderHTML(ast : Doc, options : HTMLRenderOptions = {})

HTMLRenderOptions extends Options with an overrides field, which maps node tags to overrides for their renderers.

Example of usage:

console.log(djot.renderHTML(djot.parse("- _hi_",{sourcePositions:true})));

Simple example of an override:

djot.renderHTML(
  djot.parse("_hi_", { sourcePositions: true }),
  { 
    overrides: {
      emph: (node, renderer) => {
        return `<span class="emphasized">${renderer.renderChildren(node)}</span>`;
      }
    } 
  });

This yields: <span class="emphasized">hi</span>.

Rendering djot

renderDjot(doc : Doc, options : DjotRenderOptions = {}) : string

DjotRenderOptions extends Options with a wrapWidth : number field. Its effect is as follows:

  • wrapWidth > 0: Output is wrapped to fit in the width, with `soft_break' acting like a space.
  • wrapWidth = 0: Output is not wrapped; soft_break renders as a line break.
  • wrapWidth = -1: Output is not wrapped; soft_break renders as a space.

Example of usage:

console.log(djot.renderDjot(djot.parse("_Hello_ world"), {wrapWidth: 64}));

Pandoc interoperability

toPandoc(doc : Doc, options : PandocRenderOptions) : Pandoc

PandocRenderOptions extends Options with smartPunctuationMap, which has the form and default values:

{ non_breaking_space: " ",
  ellipses: "⋯",
  em_dash: "-",
  en_dash: "-",
  left_single_quote: "‘",
  right_single_quote: "’",
  left_double_quote: "“",
  right_double_quote: "”" };

Example of usage:

console.log(JSON.stringify(djot.toPandoc(djot.parse("- one\n- two\n"))));

fromPandoc(pandoc : Pandoc, options : Options) : Doc

Example of usage:

let ast = djot.fromPandoc(JSON.parse(pandocJSON));

Filters

applyFilter(node : Doc, filter : Filter)

Example of usage:

const capitalizeFilter = () => {
  return {
           str: (e) => {
             e.text = e.text.toUpperCase();
           }
  };
};
let ast = djot.parse("Hi there `verbatim`");
djot.applyFilter(ast, capitalizeFilter);

Filters are JavaScript programs that alter the AST between parsing and rendering. They can be used for customization.

Here is an example of a filter that capitalizes all the content text in a document:

// This filter capitalizes regular text, leaving code and URLs unaffected
return {
  str: (el) => {
    el.text = el.text.toUpperCase();
  }
}

Here's a filter that prints a list of all the URLs you link to in a document. This filter doesn't alter the document at all; it just prints the list to stderr.

return {
  link: (el) => {
    process.stderr:write(el.destination + "\n")
  }
}

By default filters do a bottom-up traversal; that is, the filter for a node is run after its children have been processed. It is possible to do a top-down travel, though, and even to run separate actions on entering a node (before processing the children) and on exiting (after processing the children). To do this, associate the node's tag with a table containing enter and/or exit functions. The enter function is run when we traverse into the node, before we traverse its children, and the exit function is run after we have traversed the node's children. For a top-down traversal, you'd just use the enter functions. If the tag is associated directly with a function, as in the first example above, it is treated as an `exit' function.

The following filter will capitalize text that is nested inside emphasis, but not other text:

// This filter capitalizes the contents of emph
// nodes instead of italicizing them.
let capitalize = 0;
return {
   emph: {
     enter: (e) => {
       capitalize = capitalize + 1;
     },
     exit: (e) => {
       capitalize = capitalize - 1;
       e.tag = "span";
     },
   },
   str: (e) => {
     if (capitalize > 0) {
       e.text = e.text.toUpperCase();
      }
   }
}

A single filter may return a table with multiple tables, which will be applied sequentially:

// This filter includes two sub-filters, run in sequence
return [
  { // first filter changes (TM) to trademark symbol
    str: (e) => {
      e.text = e.text.replace(/\\(TM\\)/, "™");
    }
  },
  { // second filter changes '[]' to '()' in text
    str: (e) => {
      e.text = e.text.replace(/\\(/,"[").replace(/\\)/,"]");
    }
  }
]

The filters we've looked at so far modify nodes in place by changing one of their properties (text). Sometimes we'll want to replace a node with a different kind of node, or with several nodes, or to delete a node. In these cases we can end the filter function with a return. If a single AST node is returned, it will replace the element the filter is processing. If an array of AST nodes is returned, they will be spliced in to replace the element. If an empty array is returned, the element will be deleted.

// This filter replaces certain Symb nodes with
// formatted text.
const substitutions = {
  mycorp: [ { tag: "str", text: "My Corp" },
            { tag: "superscript",
              [ { tag: "str", text: "(TM)" } ] } ],
  myloc: { tag: "str", text: "Coyote, NM" }
  };
return {
  symb: (e) => {
    const found = substitutions[e.alias];
    if (found) {
      return found;
    }
  }
}
// This filter replaces all Image nodes with their descriptions.
return {
  image: (e) => {
    return e.children;
  }
}

It is possible to inhibit traversal into the children of a node, by having the enter function return an object with the property stop. The contents of stop will be used as the regular return value. This can be used, for example, to prevent the contents of a footnote from being processed:

return {
 footnote: {
   enter: (e) => {
     return {stop: [e]};
    }
  }
}

Getting the version

djot.version : string

returns the version number.

The AST

The most human-readable documentation for the djot AST format is the typescript type definitions.

There is also a JSON schema which can be used to verify conformity to the AST programatically.