CondeNast/atjson

Fix splitting delimiter runs

Opened this issue · 0 comments

We have a utility in our Commonmark renderer to adjust the boundaries of certain annotations when they would produce an invalid delimiter run. This logic had assumed that the rules for valid delimiter runs were the same regardless of what the specific delimiter character was, but this is not the case.

Here are the rules for delimiters, from least to most restrictive:

If the delimiter is ^ or ~:

  • the inner boundary must not be a whitespace character

If the delimiter is *, **, or ~~

  • the inner boundary for a delimiter run must not be a whitespace character
  • the outer boundary for a delimiter run must be a whitespace or punctuation character if the inner boundary is a punctuation character

If the delimiter run is _ or __

  • the inner boundary for a delimiter run must not be a whitespace character
  • the outer boundary for a delimiter run must be a whitespace or punctuation character

Here are some examples of the correct behavior. Here square brackets represent the delimiter boundary, an underscore represents a whitespace character, and a dash represents a punctuation character:

Original Split for ^, ~ Split for *, **, ~~ Split for _, __
[_a_b] _[a_b] _[a_b] _[a_b]
a[-b] a[-b] a-[b] a-[b]
a[b_c] a[b_c] a[b_c] ab_[c]
a[bc] a[bc] a[bc] abc[]