BioJulia/Automa.jl

(Optionally) despecialize generated code on its IO

Closed this issue · 2 comments

I've been thinking about this for some time, as I've run into a few related problems that I think might be related. The problem is that Automa's purpose is to create parsers, parsing data from IO objects. Practically speaking, this happens using the generate_reader function. This function creates large FSMs which take a little while to compile into massive binary chunks. Fair enough, it's usually worth waiting a little while and spending a little RAM getting a fast parser.

The problem is that the generated function is specialized on the underlying IO, and that this IO is a parametric TranscodingStream. This means the function is generated and compiled too many times: For IOBuffers, for IOStreams, for GzipDecompressorStreams wrapping an IOStream, etc. This seem wasteful.

It might be nicer to somehow create one single generic Automa-generated function. Easiest would be this can be done just by putting @nospecialize on the generated function. However, it would be nicer to somehow generate a basal function that operates on a buffer, then make a (@nospecialized) wrapper function which operates on a TranscodingStream, which wraps the former function. It's hard to figure out how to do this though.

Does the stream of bytes for each of these types look the same? That is - does the TranscodingStream look the same no matter which type the parser is parsing?

I think that's what you're saying - have Automa only parse a TranscodingStream, but have it be able to create non-specialized methods that convert - but want to double check.

Yeah, it does.
Unfortunately, I tested this, and the overhead from dynamic dispatch is significant, even for 16 KiB buffers. So, closing.