/mdast-util-from-markdown

mdast utility to parse markdown

Primary LanguageTypeScriptBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

mdast-util-from-markdown

github release npm codecov module type: esm license conventional commits typescript vitest yarn

mdast utility that turns markdown into a syntax tree

Contents

What is this?

This package is a utility that takes markdown input and turns it into a markdown abstract syntax tree.

This utility uses micromark, which turns markdown into tokens, and then turns those tokens into nodes.

When should I use this?

If you want to handle syntax trees manually, use this. When you just want to turn markdown into HTML, use micromark instead. For an easier time processing content, use the remark ecosystem instead.

Install

This package is ESM only.

In Node.js (version 18+) with yarn:

yarn add @flex-development/mdast-util-from-markdown
See Git - Protocols | Yarn  for details regarding installing from Git.

In Deno with esm.sh:

import { fromMarkdown } from 'https://esm.sh/@flex-development/mdast-util-from-markdown'

In browsers with esm.sh:

<script type="module">
  import { fromMarkdown } from 'https://esm.sh/@flex-development/mdast-util-from-markdown'
</script>

Use

Say we have the following markdown file example.md:

## Hello, *World*!

…and our module example.mjs looks as follows:

import { fromMarkdown } from '@flex-development/mdast-util-from-markdown'
import { inspect } from '@flex-development/unist-util-inspect'
import { read } from 'to-vfile'

const file = await read('example.md')
const tree = fromMarkdown(String(file))

console.log(inspect(tree))

…now running node example.mjs yields:

root[1] (1:1-2:1, 0-19)
└─0 heading[3] (1:1-1:19, 0-18)
    │ depth: 2
    ├─0 text "Hello, " (1:4-1:11, 3-10)
    ├─1 emphasis[1] (1:11-1:18, 10-17)
    │   └─0 text "World" (1:12-1:17, 11-16)
    └─2 text "!" (1:18-1:19, 17-18)

API

fromMarkdown(value[, encoding][, options])

Turn markdown into a syntax tree.

Overloads

  • (value: Value | null | undefined, encoding?: Encoding | null | undefined, options?: Options) => Root
  • (value: Value | null | undefined, options?: Options | null | undefined) => Root

Parameters

Returns

(Root) mdast.

compiler([options])

Create an mdast compiler.

👉 The compiler only understands complete buffering, not streaming.

Parameters

  • options (Options | null | undefined, optional) — configuration

Returns

(Compiler) mdast compiler.

handles

(Handles) Token types mapped to default token handlers.

👉 Default handlers are also exported by name. See src/handles.ts for more info.

CompileContext

mdast compiler context (TypeScript type).

Properties

  • buffer ((this: CompileContext) => undefined) — capture some of the output data
  • config (Config) — configuration
  • data (CompileData) — info passed around; key/value store
  • enter ((this: CompileContext, node: Nodes, token: Token, onError?: OnEnterError) => undefined) — enter a node
  • exit ((this: CompileContext, token: Token, onError?: OnExitError) => undefined) — exit a node
  • resume ((this: CompileContext) => string) — stop capturing and access the output data
  • sliceSerialize (TokenizeContext['sliceSerialize']) — get the string value of a token
  • stack (StackedNode[]) — stack of nodes
  • tokenStack (TokenTuple[]) — stack of tokens

CompileData

Interface of tracked data (TypeScript interface).

interface CompileData {/* see code */}

When developing extensions that use more data, augment CompileData to register custom fields:

declare module 'mdast-util-from-markdown' {
  interface CompileData {
    mathFlowInside?: boolean | undefined
  }
}

Compiler

Turn micromark events into a syntax tree (TypeScript type).

Parameters

  • events (Event[]) — list of events

Returns

(Root) mdast.

Config

Configuration (TypeScript type).

Properties

  • canContainEols (string[]) — token types where line endings are used
  • enter (Handles) — opening handles
  • exit (Handles) — closing handles
  • transforms (Transform[]) — tree transforms

Encoding

Encodings supported by TextEncoder (TypeScript type).

See micromark-util-types for more info.

type Encoding =
  | 'utf-8' // always supported in node
  | 'utf-16le' // always supported in node
  | 'utf-16be' // not supported when ICU is disabled
  | (string & {}) // everything else (depends on browser, or full ICU data)

Event

The start or end of a token amongst other events (TypeScript type).

See micromark-util-types for more info.

type Event = ['enter' | 'exit', Token, TokenizeContext]

Extension

Change how tokens are turned into nodes (TypeScript type).

See Config for more info.

type Extension = Partial<Config>

Fragment

Temporary node (TypeScript type).

type Fragment = Omit<mdast.Parent, 'children' | 'type'> & {
  children: mdast.PhrasingContent[]
  type: 'fragment'
}

Properties

Handle

Handle a token (TypeScript type).

Parameters

Returns

(undefined | void) Nothing.

Handles

Token types mapped to handles (TypeScript type).

type Handles = Record<string, Handle>

OnEnterError

Handle the case where the right token is open, but is closed by the left token, or because end of file was reached (TypeScript type).

Parameters

Returns

(undefined) Nothing.

OnExitError

Handle the case where the right token is open, but is closed by exiting the left token (TypeScript type).

Parameters

Returns

(undefined) Nothing.

Options

Configuration options (TypeScript type).

Properties

  • extensions? (micromark.Extension[] | null | undefined) — extensions for this utility to change how tokens are turned into nodes
  • from? (StartPoint | null | undefined) — point before first character in markdown value. node positions will be relative to this point
  • mdastExtensions? ((Extension | Extension[])[] | null | undefined) — extensions for this utility to change how tokens are turned into nodes

Point

A location in the source document and chunk (TypeScript type).

See micromark-util-types for more info.

StackedNode

A node on the compiler context stack (TypeScript type).

type StackedNode = Fragment | mdast.Nodes

StartPoint

Point before first character in a markdown value (TypeScript type).

type StartPoint = Omit<Point, '_bufferIndex' | '_index'>

TokenTuple

List containing an open token on the stack, and an optional error handler to use if the token isn't closed properly (TypeScript type).

type TokenTuple = [token: Token, handler: OnEnterError | undefined]

Token

A span of chunks (TypeScript interface).

See micromark-util-types for more info.

TokenizeContext

A context object that helps with tokenizing markdown constructs (TypeScript interface).

See micromark-util-types for more info.

Transform

Extra transform, to change the AST afterwards (TypeScript type).

Parameters

  • tree (Root) — tree to transform

Returns

(Root | null | undefined | void) New tree or nothing (in which case the current tree is used).

Value

Contents of a file.

See micromark-util-types for more info.

type Value = Uint8Array | string

List of extensions

Syntax

Markdown is parsed according to CommonMark. Extensions can add support for other syntax. If you’re interested in extending markdown, more information is available in micromark’s readme.

Syntax tree

The syntax tree is mdast.

Types

This package is fully typed with TypeScript.

Security

As markdown is sometimes used for HTML, and improper use of HTML can open you up to a cross-site scripting (XSS) attack, use of mdast-util-from-markdown can also be unsafe.

When going to HTML, use this utility in combination with hast-util-sanitize to make the tree safe.

Related

Contribute

See CONTRIBUTING.md.