/commonmarker

Ruby wrapper for comrak (CommonMark parser)

Primary LanguageRustMIT LicenseMIT

Commonmarker

Ruby wrapper for Rust's comrak crate.

It passes all of the CommonMark test suite, and is therefore spec-complete. It also includes extensions to the CommonMark spec as documented in the GitHub Flavored Markdown spec, such as support for tables, strikethroughs, and autolinking.

For more information on available extensions, see the documentation below.

Installation

Add this line to your application's Gemfile:

gem 'commonmarker'

And then execute:

$ bundle

Or install it yourself as:

$ gem install commonmarker

Usage

Converting to HTML

Call to_html on a string to convert it to HTML:

require 'commonmarker'
Commonmarker.to_html('"Hi *there*"', options: {
    parse: { smart: true }
})
# => <p>“Hi <em>there</em>”</p>\n

(The second argument is optional--see below for more information.)

Generating a document

You can also parse a string to receive a :document node. You can then print that node to HTML, iterate over the children, and do other fun node stuff. For example:

require 'commonmarker'

doc = Commonmarker.parse("*Hello* world", options: {
    parse: { smart: true }
})
puts(doc.to_html) # => <p><em>Hello</em> world</p>\n

doc.walk do |node|
  puts node.type # => [:document, :paragraph, :emph, :text, :text]
end

(The second argument is optional--see below for more information.)

When it comes to modifying the document, you can perform the following operations:

  • insert_before
  • insert_after
  • prepend_child
  • append_child
  • delete

You can also get the source position of a node by calling source_position:

doc = Commonmarker.parse("*Hello* world")
puts doc.first_child.first_child.source_position
# => {:start_line=>1, :start_column=>1, :end_line=>1, :end_column=>7}

You can also modify the following attributes:

  • url
  • title
  • header_level
  • list_type
  • list_start
  • list_tight
  • fence_info

Example: Walking the AST

You can use walk or each to iterate over nodes:

  • walk will iterate on a node and recursively iterate on a node's children.
  • each will iterate on a node and its children, but no further.
require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Walk tree and print out URLs for links
doc.walk do |node|
  if node.type == :link
    printf("URL = %s\n", node.url)
  end
end
# => URL = https://www.github.com

# Transform links to regular text
doc.walk do |node|
  if node.type == :link
    node.insert_before(node.first_child)
    node.delete
  end
end
# => <h1><a href=\"#the-site\"></a>The site</h1>\n<p>GitHub</p>\n

Example: Converting a document back into raw CommonMark

You can use to_commonmark on a node to render it as raw text:

require 'commonmarker'

# parse some string
doc = Commonmarker.parse("# The site\n\n [GitHub](https://www.github.com)")

# Transform links to regular text
doc.walk do |node|
  if node.type == :link
    node.insert_before(node.first_child)
    node.delete
  end
end

doc.to_commonmark
# => # The site\n\nGitHub\n

Options and plugins

Options

Commonmarker accepts the same parse, render, and extensions options that comrak does, as a hash dictionary with symbol keys:

Commonmarker.to_html('"Hi *there*"', options:{
  parse: { smart: true },
  render: { hardbreaks: false}
})

Note that there is a distinction in comrak for "parse" options and "render" options, which are represented in the tables below.

Parse options

Name Description Default
smart Punctuation (quotes, full-stops and hyphens) are converted into 'smart' punctuation. false
default_info_string The default info string for fenced code blocks. ""
relaxed_tasklist_matching Enables relaxing of the tasklist extension matching, allowing any non-space to be used for the "checked" state instead of only x and X. false
relaxed_autolinks Enable relaxing of the autolink extension parsing, allowing links to be recognized when in brackets, as well as permitting any url scheme. false

Render options

Name Description Default
hardbreaks Soft line breaks translate into hard line breaks. true
github_pre_lang GitHub-style <pre lang="xyz"> is used for fenced code blocks with info tags. true
full_info_string Gives info string data after a space in a data-meta attribute on code blocks. false
width The wrap column when outputting CommonMark. 80
unsafe Allow rendering of raw HTML and potentially dangerous links. false
escape Escape raw HTML instead of clobbering it. false
sourcepos Include source position attribute in HTML and XML output. false
escaped_char_spans Wrap escaped characters in span tags. true
ignore_setext Ignores setext-style headings. false
ignore_empty_links Ignores empty links, leaving the Markdown text in place. false
gfm_quirks Outputs HTML with GFM-style quirks; namely, not nesting <strong> inlines. false
prefer_fenced Always output fenced code blocks, even where an indented one could be used. false

As well, there are several extensions which you can toggle in the same manner:

Commonmarker.to_html('"Hi *there*"', options: {
    extension: { footnotes: true, description_lists: true },
    render: { hardbreaks: false }
})

Extension options

Name Description Default
strikethrough Enables the strikethrough extension from the GFM spec. true
tagfilter Enables the tagfilter extension from the GFM spec. true
table Enables the table extension from the GFM spec. true
autolink Enables the autolink extension from the GFM spec. true
tasklist Enables the task list extension from the GFM spec. true
superscript Enables the superscript Comrak extension. false
header_ids Enables the header IDs Comrak extension. from the GFM spec. ""
footnotes Enables the footnotes extension per cmark-gfm. false
description_lists Enables the description lists extension. false
front_matter_delimiter Enables the front matter extension. ""
multiline_block_quotes Enables the multiline block quotes extension. false
math_dollars, math_code Enables the math extension. false
shortcodes Enables the shortcodes extension. true
wikilinks_title_before_pipe Enables the wikilinks extension, placing the title before the dividing pipe. false
wikilinks_title_after_pipe Enables the shortcodes extension, placing the title after the dividing pipe. false
underline Enables the underline extension. false
spoiler Enables the spoiler extension. false
greentext Enables the greentext extension. false

For more information on these options, see the comrak documentation.

Plugins

In addition to the possibilities provided by generic CommonMark rendering, Commonmarker also supports plugins as a means of providing further niceties.

Syntax Highlighter Plugin

The library comes with a set of pre-existing themes for highlighting code:

  • "base16-ocean.dark"
  • "base16-eighties.dark"
  • "base16-mocha.dark"
  • "base16-ocean.light"
  • "InspiredGitHub"
  • "Solarized (dark)"
  • "Solarized (light)"
code = <<~CODE
  ```ruby
  def hello
    puts "hello"
  end
  ```
CODE

# pass in a theme name from a pre-existing set
puts Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "InspiredGitHub" } })

# <pre style="background-color:#ffffff;" lang="ruby"><code>
# <span style="font-weight:bold;color:#a71d5d;">def </span><span style="font-weight:bold;color:#795da3;">hello
# </span><span style="color:#62a35c;">puts </span><span style="color:#183691;">&quot;hello&quot;
# </span><span style="font-weight:bold;color:#a71d5d;">end
# </span>
# </code></pre>

By default, the plugin uses the "base16-ocean.dark" theme to syntax highlight code.

To disable this plugin, set the value to nil:

code = <<~CODE
  ```ruby
  def hello
    puts "hello"
  end
  ```
CODE

Commonmarker.to_html(code, plugins: { syntax_highlighter: nil })

# <pre lang="ruby"><code>def hello
#   puts &quot;hello&quot;
# end
# </code></pre>

To output CSS classes instead of style attributes, set the theme key to "":

code = <<~CODE
  ```ruby
  def hello
    puts "hello"
  end
CODE

Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "" } })

# <pre class="syntax-highlighting"><code><span class="source ruby"><span class="meta function ruby"><span class="keyword control def ruby">def</span></span><span class="meta function ruby"> # <span class="entity name function ruby">hello</span></span>
#   <span class="support function builtin ruby">puts</span> <span class="string quoted double ruby"><span class="punctuation definition string begin ruby">&quot;</span>hello<span class="punctuation definition string end ruby">&quot;</span></span>
# <span class="keyword control ruby">end</span>\n</span></code></pre>

To use a custom theme, you can provide a path to a directory containing .tmtheme files to load:

Commonmarker.to_html(code, plugins: { syntax_highlighter: { theme: "Monokai", path: "./themes" } })

Output formats

Commonmarker can currently only generate output in one format: HTML.

HTML

puts Commonmarker.to_html('*Hello* world!')

# <p><em>Hello</em> world!</p>

Developing locally

After cloning the repo:

script/bootstrap
bundle exec rake compile

If there were no errors, you're done! Otherwise, make sure to follow the comrak dependency instructions.

Benchmarks

❯ bundle exec rake benchmark
input size = 11064832 bytes

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
  Markly.render_html     1.000 i/100ms
Markly::Node#to_html     1.000 i/100ms
Commonmarker.to_html     1.000 i/100ms
Commonmarker::Node.to_html
                         1.000 i/100ms
Kramdown::Document#to_html
                         1.000 i/100ms
Calculating -------------------------------------
  Markly.render_html     15.606 (±25.6%) i/s -     71.000 in   5.047132s
Markly::Node#to_html     15.692 (±25.5%) i/s -     72.000 in   5.095810s
Commonmarker.to_html      4.482 (± 0.0%) i/s -     23.000 in   5.137680s
Commonmarker::Node.to_html
                          5.092 (±19.6%) i/s -     25.000 in   5.072220s
Kramdown::Document#to_html
                          0.379 (± 0.0%) i/s -      2.000 in   5.277770s

Comparison:
Markly::Node#to_html:       15.7 i/s
  Markly.render_html:       15.6 i/s - same-ish: difference falls within error
Commonmarker::Node.to_html:        5.1 i/s - 3.08x  slower
Commonmarker.to_html:        4.5 i/s - 3.50x  slower
Kramdown::Document#to_html:        0.4 i/s - 41.40x  slower