Benchmark parser plus source code info generation, possibly optimize

Question

Benchmark parser plus source code info generation, possibly optimize

jhump opened this issue 2 years ago · 0 comments

The AST representation is very useful for some kinds of tools, and also provides a nice API for extracting source position information that would be otherwise unavailable in a file descriptor. That's the reason the parser first creates the AST before then generating a file descriptor.

However, the AST is definitely a source of memory consumption that could be nice to omit. So it would be nice if the parser had an alternate mode of execution where it directly generated a file descriptor. One difficulty with this is that we use a bottom-up generated parser, whereas protoc uses a top-down parser. It is much easier to compute source code info as you parse in a top-down approach, since the parser can accumulate source info paths as it descends. So the trick with skipping the AST step would be how to compute data structures needed for creating the source code info of a file descriptor.

It is possible that it's simply not worth have two separate parser paths. We first need to benchmark the parser, to see the time to descriptor proto, plus source code info generation, so we even know what portion of the compiler is spent just in those phases. (The rest being spent in linking [almost certainly the great majority] and interpreting options.) If these parse phases only make up a small amount of the total time to compile, then it is probably not worth maintaining a separate parse path. (Though there could be sufficient memory/GC savings to make it still worthwhile.)

So the first step is to create a benchmark to measure the time of just parsing and source code info generation. That way we can assess whether there is sufficient room to improve performance by omitting the AST generation phase to justify the effort.