Better support for parsing LLM output

Question

Better support for parsing LLM output

verhovsky opened this issue a year ago · 0 comments

cmark-gfm is used by a number of apps that interface with text-generating Large Language Models (LLMs) (this one and this one are the ones I know). These models produce a few characters of Markdown every 200ms (on my machine) and cmark-gfm is used continuously to render the output text so far as Markdown. This is inefficient because (as far as I can tell) the entire generated Markdown has to be re-parsed from the beginning for every generated token, even though it has already been parsed except for the latest token.

cmark-gfm has a streaming interface of cmark_parser_feed and cmark_parser_finish but it seems like I need to call cmark_parser_finish every time I actually want to parse and I need to re-create a parser after that, I can't feed more tokens and re-parse. I would have expected there to be a way to cmark_parser_feed and then cmark_parser_parse and then doing cmark_parser_feed again, or a more complicated interface for editing the parse tree like tree-sitter has.

Also, while we're at it, the other issue is that the syntax isn't stable when it hasn't yet seen the entire input. Namely, a trailing single backtick ` should open a code block until the end of the line/input even if there's no closing backtick. The way it is now leads to jittering in the UI, where the UI first prints a backtick and a few seconds later removes it and re-renders everything after it in monospace when the LLM generates the closing backtick. This is also a problem for horizontal rules and bold/italic but definitely the latter isn't doable because many people use single * characters for multiplication.