tree-sitter-grammars/tree-sitter-markdown

Code block not recognized as one if there is trailing space on the closing fence

kirawi opened this issue · 5 comments

Describe the bug

Ref helix-editor/helix#9678

Code example

'''c
''' 
int i = 0;

Replace the single quotes with backticks and int i = 0 will be highlighted as C.

Expected behavior

According to https://github.github.com/gfm/#fenced-code-blocks, trailing whitespace are ignored on the closing code fence.

Actual behavior

Closing fence is not recognized.

For the record, this parser implements CommonMark Spec, with only some GFM extensions (that are optional but enabled by default). I would prefer to stay strict on this (and similar "softenings" of restrictions that make parsing easier).

But CommonMark also specifies that whitespace after closing fences are to be ignored, so this should be fixed (here). PR welcome!

I can confirm that this also happens when the closing fence is followed immediately by EOF. That's an easier scenario to catch than trailing whitespace, so I might open a PR for that case specifically.

I can confirm that this also happens when the closing fence is followed immediately by EOF. That's an easier scenario to catch than trailing whitespace, so I might open a PR for that case specifically.

If that's easy, a fix there would be awesome, because I think that's probably a more common scenario than trailing whitespace, though obviously both would be nice to have

I would think both should be a fairly easy fix, no? I might be missing something, but it looks like we could just check for end of file or space in addition to \r and \n in

(lexer->lookahead == '\n' || lexer->lookahead == '\r')) {