Knagis/CommonMark.NET

How to recognize escaped characters in AST?

Closed this issue · 7 comments

fhaag commented

I'm looking for a way to find out whether a given character in an AST String inline node was escaped in the original Markdown.

In particular, I'd like to know this for square brackets. My goal is to enable something like artifact links in SourceForge's Allura Markdown. You can think of it as an inline marker comparable to a link, but without any use for a custom link text.

With the current CommonMark.NET, I can already scan the text returned as literal strings and check whether there are any [...] like substrings that I would like to process.

However, for this to work properly, I would like to respect explicit escapes by the user. Thus, \[...] should not be considered for custom processing, [...\]...] would recognize the first ] as a part of the string within the processing-worthy square brackets, etc. Is there a (clean) way to retrieve this bit of information already?

It would, of course, be extra-convenient to receive [...] groups as a separate inline tag, but as that is not part of the CommonMark standard, I understand it may not be desirable to integrate this feature so deeply in the CommonMark.NET source code.

I think that for this the easiest option is to modify the parser.

First see line https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/InlineMethods.cs#L552

Here you can see that if details==null then there are no additional details for the link; it is the simple [foo] format.

Then on line (though see that the caller converts InvalidReference to null as well so you would have to convert that as well: https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/InlineMethods.cs#L482

When details == null it basically means that what seemed to be a link (for example [foo]) was not treated as such because it was not followed by a url and it was not also defined as a reference ([foo]:/url/). You would have to check if the details were not InvalidReference (see the previous paragraph) and if so, allow the code to create the Link AST entry, just perhaps use a different InlineTag value.

This way you will get all of the parsing logic for these links as well (such as correct nesting etc.).


And after writing all of the above, I figured a much simpler approach - see line https://github.com/Knagis/CommonMark.NET/blob/master/CommonMark/Parser/InlineMethods.cs#L956 - here instead of return null you could do return new Reference() { Url = "##artifact##" } and then when you write the links, check for this "magic" URL value to print the correct url. This way you only have to modify a single line in the parser.

fhaag commented

Thanks for these explanations. I have successfully integrated the feature into a local copy of CommonMark.NET based upon your last solution. In fact, I have changed a few more lines in order to support a new InlineTag value for that kind of syntax. (I called it Placeholder, because it's essentially a keyword/string that gets replaced by the host application -- if the host application decides it wants to process that particular string rather than insert it as a bracketed literal.)

Looking at this feature again, would it be desirable from your side to add it as an activatable "additional feature", the way it is currently done with the strikethrough feature? Such a "placeholder" feature could help connect Markdown with application-specific content, as it currently happens with SourceForge's aforementioned artifact links, tag reference on the Stack Exchange network, the table of contents in Roadkill Wiki (this one uses braces, but it is conceptually the same thing), etc.

If you might consider this worthwhile, I could create a unit test and put my modifications on Github for you guys to have a look.

fhaag commented

To make this feature a optional, I'd have to evaluate the settings in InlineMethods.ParseLinkDetails, which is invoked by InlineMethods.HandleRightSquareBracket. Otherwise, there doesn't seem to be any way to pass anything into the handle* methods.

Is that intentional, or was it simply not required yet (as the strikethroughTilde option can be evaluated right in InitializeParsers), or is there another intended way of accessing the settings in the handle* methods? (I don't want to mess up a carefully designed architecture.)

I would probably try to modify InitializeParsers to something like

if (allowPlaceholders)
    p[']'] = subject => HandleRightSquareBracket(subject, true);

And see if the compiler is able to properly inline this method (not reducing the performance).

Unfortunately the architecture as it is currently implemented is a result from the library being a port from the reference implementation - and as such it does not really give a good way of passing the settings around.

fhaag commented

I have uploaded my code to a fork. I still need to add some unit tests, though.

For now, I have adapted the formatters to cope with the additional InlineTag.Placeholder tag. However, as placeholders are only meaningful with respect to the context the Markdown parsing/formatting occurs in, this means that the (configuration-less) slim HTML formatter will simply render all placeholders as literals instead of processing them. The non-slim (fat?) HTML formatter, on the other hand, can now invoke an optional processing function that may substitute the placeholder with something else.

One issue in this respect that I do not yet like is that the substitution function gets invoked twice per placeholder (once for the opening node, once for the closing node) - and in both cases, the behaviour will depend on whether the placeholder processing function returns a substitution. I am going to look into the CommonMark.Syntax.Enumerable class to check whether certain nodes can be dynamically skipped (probably not, as the processing result of a placeholder node is not guaranteed to be known before the next node is enumerated), or else maybe store the closing placeholder tag on a stack, similarly to what happens in the slim HTML formatter.

I suppose now that #93 is merged, this can be closed?

fhaag commented

Indeed, all my wishes have been fulfilled :)