alecthomas/chroma

How to lex heredocs (Caddyfile)

francislavoie opened this issue · 4 comments

I'm working on updating the Caddyfile lexer, to fix bugs and support new syntax features we've implemented in the past couple years.

We've added heredocs support https://caddyserver.com/docs/caddyfile/concepts#heredocs. I'm trying to figure out how to implement this in the lexer.

How do I store the heredoc marker (the part after <<), push into a "heredoc" state, and then pop the stack once that same string is found again? Is there somekind of storage mechanism in the state?

I see that Raku uses a custom mutator func. Is that my only option? Looks complicated.

I also noticed that since last time I contributed, most of the lexers got translated to XML. Is this something we should aim to do (tbh, ew 😬 looks like much worse UX to develop than the Go DSL) or is it fine for me to keep using Go?

Look at how the bash lexer does it, it's the same concept.

I will not accept Go lexers unless there's a good reason, you should use XML. Chroma is used by a lot of people who aren't programmers, so XML is a better choice to foster contributions than Go. It's also much much more efficient in terms of binary size and startup time.

I will not accept Go lexers unless there's a good reason, you should use XML.

To be clear, I'm not making a new lexer, only updating an existing one. https://github.com/alecthomas/chroma/blob/master/lexers/caddyfile.go

It's also much much more efficient in terms of binary size and startup time.

I find that surprising. Why is compiled Go code slower/larger?

Look at how the bash lexer does it, it's the same concept.

I can't find the heredoc lexing. Can you point me to it? I don't see any << stuff in bash.xml.

Edit: oh, &lt;&lt;&lt; 🤦‍♂️

If it's already in Go then it's definitely fine to update.

I find that surprising. Why is compiled Go code slower/larger?

This is a bit surprising, but each lexer is a whole bunch of what is effectively dynamic code to the Go compiler, so it a) expands to a very large amount of machine code and b) takes a considerable amount of time to execute at init time. If the Go compiler were a lot smarter this would probably end up being static data sections, but it isn't.

I can't find the heredoc lexing. Can you point me to it? I don't see any << stuff in bash.xml.

Search for &lt;&lt;. It's a single regex with a backreference.

I see the \2. So it's a single regexp.

Is there a way to do this with state though (push + pop)? I have some other token types I want to handle within the heredoc.

For example, if something like {foo} appears within the heredoc, I want that to be a LiteralStringEscape.