teo-tsirpanis/Farkle

Build designtime Farkles on compile-time.

teo-tsirpanis opened this issue · 4 comments

In contrast with FsLexYacc and other traditional parser generators, Farkle creates the parsing tables of a grammar at runtime, instead of at compile-time. This poses two problems:

  • Compared to parsing, building a designtime Farkle is a slower and more computationally intensive procedure. This inherent limitation is mostly ignored because building has a cost that is supposed to be paid only once in the execution of the average app that uses a static grammar.

The likes of FParsec largely evade this problem because they do not use precomputed parsing tables. Creating an FParsec Parser is merely creating some objects; no special algorithm involved.

When we say "slow", we mean that building a designtime Farkle for GOLD Meta-Language takes about ten milliseconds in the benchmarks. Compare this with reading an EGT file which takes some hundreds of microseconds. Both times might seem very fast, but Farkle is a library that is meant to be fast.

  • Errors with the built grammar (such as LALR conflicts) are not reported until it's too late when text is attempted to be parsed using a defective grammar. FsLexYacc brings these errors to the user's attention earlier when the .fsl/y file is being compiled (it actually tries to resolve them itself instead of failing but that's a practice Farkle should not follow). FParsec does not bother with such kinds of errors; it will (try to) parse whatever the user gives it.

A solution for these two problems is precompiled grammars.

How it works

We will create a new function with the following signature:

module RuntimeFarkle =

val markForPrecompile: DesigntimeFarkle<'T> -> DesigntimeFarkle<'T>

It will be supposed to be called like that from user code:

let designtime =
  Terminals.int "My Number"
  |> RuntimeFarkle.markForPrecompile

let runtime = RuntimeFarkle.build designtime

After compilation, Farkle.Tools.MSBuild (or the CLI tools) will probe the compiled assembly for all properties holding a designtime Farkle and are marked for precompilation, build the grammar, and embed the compiled tables in the assembly.

When the designtime Farkle is being built, Farkle is going to check if the assembly has a precompiled grammar in it, and use it, instead of building it again. If no such grammar is found, nothing changes. The whole process must be totally transparent from the user's perspective.

If building a grammar fails, an appropriate error message must be displayed, and in MSBuild's case, building must fail (maybe add a flag to ignore the errors; they will already be caught by the time the runtime Farkle gets used).

Like with other metadata setters, markForPrecompile is supposed to be only called at the topmost designtime Farkle and be ignored in other cases.

What needs to be done

  • We have to create a custom binary format for grammars that more closely matches Farkle's domain model. And we have to write both a reader and a writer for this format.

    • It is already created. It is called EGTneo (new encoding option), it is based on the EGT file's semantics and is more compact and faster (as evidenced by benchmarks) and easier (the EGT reader contains some spaghetti code) to read from. It is not compatible with GOLD Parser though.
    • The Farkle.Grammar.EGT module is updated to transparently read from either EGT or EGTneo files from the same functions.
    • The same module also got three functions for writing grammars to EGTneo files (from streams, files, and Base64 strings).
      • However, EGTneo is not a "file" in the sense EGT is. EGTneo "files" are not supposed to be put on actual, standalone files on the disk. So, we might need to expose just one function for writing to streams.
  • We have to dynamically load the assemblies to process and get their grammars.

    • It's going to need two implementations: AppDomains on .NET Framework, and AssemblyLoadContexts on .NET Core.
      • Farkle.Tools.MSBuild will not be able to target .NET Standard anymore. That's fine, we were already going to make a major release.
    • That assembly also has dependencies itself. Farkle is one of them. Which Farkle library will do the building? The one referenced by the assembly? Or the one referenced by the tools? Will we allow the versions of these two libraries to be different?
  • We then need a way to write the precompiled grammar back to the assembly. The easiest solution is through embedded resources.

    • Mono.Cecil will help us here.
    • We need to take care of strong-naming though.
      • Fody will not be used due to its shady licensing model and its lack of intuitiveness.

Progress report

  • The AssemblyLoadContext-based .NET Core dynamic assembly loader was completed.
    • Much to my surprise, it can process .NET Framework assemblies!
    • I am still not 100% sure about its robustness; there might cases it wouldn't work but how can I be sure?
  • The AppDomain-based .NET Framework implementation on the other hand turned out to be more complicated than it looked like, and is still not completed.
    • The first alpha release of Farkle 6 will not support precompiling from the .NET Framework MSBuild. A warning will be raised.
      • The recommended way is to build by running dotnet build.
      • Therefore building from Visual Studio will not precompile any grammar.
      • Rider on the other hand can switch to a .NET Core-based MSBuild with a workaround and use the precompiler.
  • Manual precompiling will not be supported on the CLI tool. The precompiler needs to know the assembly's references during compilation. It's surprisingly simple to get them in MSBuild, but the CLI tool user should not bother with such details.
  • I have to write documentation about using the precompiler.

It has been completed since quite some time. There were some diversions from the original plan:

  • The .NET Framework precompiler was scrapped; using AppDomains is very frustrating in comparison with AssemblyLoadContexts.
    • I might try it again; Visual Studio support for the precompiler is quite desirable.
  • The type PrecompilableDesigntimeFarkle<'T> was introduced (with its untyped counterpart); the precompiler will try to find static readonly fields of that type only.
  • Building the grammar will be performed by Farkle.Tools.MSBuild's Farkle assembly. To avoid compatibility problems, Farkle's version must match with Farkle.Tools.MSBuild's.

At the time of Farkle's 6.0.0 stable release, there was one more important diversion: precompilable designtime Farkles (PCDFs) did not implement the DesigntimeFarkle interface and use special functions to be built that fail if they were not precompiled. Despite forcing the precompiler to be always used, this approach has several advantages:

  • Because PCDFs cannot be applied any other customization function, the correct way to use the markForPrecompile function -once at the end- is enforced at the type system.
  • Using the precompiler means that some performance characteristic of the application's startup are expected. Farkle will prefer to fail instead of not meeting them.
  • Because PCDFs are not built at runtime, the IL linker can trim parts of the builder away, such as the state table creator or the string regex language definition.