BioJulia/Automa.jl

Generated code is highly redundant

Opened this issue ยท 5 comments

Currently, in generate_exec_code, with :goto-generators, the expression bound to each action is NOT spliced into the generated code once. Instead, it is spliced into the generated code once for every edge in the FSM it appears on.

This is due to simplicity. Now, every edge generates its own little code snippet. For example, if the N'th edge leading to state X contains the action :foo with the Julia code do_foo(), the generated code will be

@label state_X_action_N
begin
    do_foo()
end
@goto state_X

The problem, of course, is that the action foo can be present on many edges. So instead of using the gotos to go to the SAME code section with do_foo(), do_foo() is spliced in multiple times leading to huge codegen.
Worse, the codegen potentially scales with the number of states squared.

Here is a MWE that makes the codegen blow up

import Automa
import Automa.RegExp: @re_str
const re = Automa.RegExp
const LETTERS = 'a':'m'

machine = let
    letters = []
    for i in eachindex(LETTERS)
        pat = re.parse("[$(join(deleteat!(collect(LETTERS), i)))]")
        pat.actions[:enter] = [Symbol(LETTERS[i])]
        push!(letters, pat)
    end

    states = []
    for i in eachindex(letters)
        push!(states, re.alt(deleteat!(copy(letters), i)...))
    end

    Automa.compile(re.cat(states...))
end

context = Automa.CodeGenContext(generator=:goto)
actions = Dict(Symbol(L) => :(foo[$i] += 0x01) for (i,L) in enumerate(LETTERS))

# Have a look at code now:
code = string(Automa.generate_exec_code(context, machine, actions));

It can get much worse. Setting LETTERS to e.g. 'a':'z' will cause each of the 26 actions to be duplicated 26^2 times, leading to 5.4 MiB of native code.

And here is a counter of how many times each action is multiplied in the code in FASTX.jl's FASTA machine:

  • :countline = 17
  • :mark = 8
  • :letters = 8
  • :record = 7
  • :identifier = 6
  • :header = 6
  • :pos = 5
  • :description = 2

We really, really should fix this.

After literally months, I finally thought of an "obvious" solution to this problem.

This problem can be solved by managing a "stack" of actions and nodes that you need to go to. For example, imagine an edge with actions [:a, :b, :c] followed by @goto node 1. This could be solved by something like

push "@goto node 1" onto the stack
push "@goto action_c" onto the stack
push "@goto action_b" onto the stack
@goto action_a

, where all action blocks :a, :b and :c ends with code that does

jmp (pop stack)

But... that's literally how function calls work! ๐Ÿ˜… Well, almost. My proposed solution is to wrap all actions in an anonymous function. The edge's code is then

action_a()
action_b()
action_c()
@goto node 1

This is slightly more inefficient because after action_a, it jumps back to the edge before jumping to action_b, which is obviously pointless. Although practically speaking, I suspect modern CPUs will make these jumps incredibly fast due to speculative execution. On the other hand, this solution exploits Julia's inlining heuristics, so small code snippets can just be inlined. Cool!

Needs to be benchmarked, though. All these extra branches could really mess up performance.

I doubt I can actually implement this. Instead, I'll just improve the docs to explain that I recommend the actions are function calls.

I got a new idea on how to implement this. The idea is to outline code by wrapping it in function calls. Here is an example:

julia> function foo()
           function bar()
               y += x
           end
           x = 1
           y = 1
           bar()
           y
       end;

julia> foo()
2

This works by default because inner functions are closures in Julia.
The problem is that Julia currently boxes x and y in the example above. I don't know how much of this is the "slow closure bug" from Julia's compiler - it might be worth looking into this once that bug in fixed.

I'm glad you're still thinking about this, because I have zero concept of the implications ๐Ÿ˜…

Does it help at all to use other kinds of closures - anonymous functions, begin blocks, etc?

I don't think that makes a difference. I'm going to try implementing this at some point and see the performance impact.
It's possible this is just a limitation of the Julia compiler which is unlikely to be lifted. In that case, too bad, but let's keep this issue open nonetheless.