twineworks/tweakflow

Feature request: Thread-Safe Compiled Expressions

andreisobchuck opened this issue ยท 16 comments

The compilation of expressions is relatively expensive and slow. Unfortunately, the result of compilation - Runtime - is not thread-safe. Therefore, it's not globally cacheable and must be (re-)compiled for every http-request.

The number of actively used expressions at some time is finite and a global cache of precompiled expressions will give a benefit.

Hi @andreisobchuck,

thanks for getting in touch. You are right, you can't evaluate expressions in the same runtime concurrently. Can you give some detail about how you would like to use a compiled runtime in your cconcurrent scenario?

There may be a way of cheaply cloning or resetting a compiled runtime. Another option is to generate a fixed number of structurally identical runtimes per-thread, and treating them like a thread pool.

Note:
The way we solve this in tweakstreet, is to have a hierarchy of modules and libs. When multiple threads are evaluating variables, the way the variables are organized in modules and libraries ensures that every thread only updates a subtree of variables it owns at the time, never changing a variable that could affect any other thread. Concurrent evaluation of the same values is supported by the runtime, but concurrent evaluation while threads mutate each other's visible values is not.

Best Wishes
Slawo

My first attempt was to just cache a result of VarTable.compile in a static size limited guava cache. Our expressions are written by clients as a field in a table, and evaluated for rows. And a solution when every thread only updates a subtree of variables it owns at the time looks like too complex. A pool of compiled runtimes is also sounded too complex.

Cheap cloning sounds good, IMO.

Alright, thanks. Let me look in to that.

I've created a feature branch features/dup-runtime and created a Runtime.copy() method that gives you a deep copy of the compiled Runtime. The method should be called before the Runtime evaluates anything, so it is in a pristine state.

You can use that to get a relatively cheap copy.

I've included tests illustrating usage here:

https://github.com/twineworks/tweakflow/blob/features/dup-runtime/src/test/java/com/twineworks/tweakflow/lang/runtime/RuntimeTest.java

Would you be able to give that a try and verify if it sufficiently addresses your use case?

To build the new version, you'd have to check out the features/dup-runtime branch and run

mvn clean package

It will produce a new jar in the ./target folder.

Thanks
Slawo

Greate! I will try it today.

API is ok.
But, our current version in prod is 0.7.0. I've tried to upgrade to 0.19.0 before testing new API, and it looks like a parallel loader made the situation worse for us. (Probably, switching to new API will fix it, but anyway). Our services are deployed to Kubernetes and they have a small limit on processors - 2 per pod. I think that an old plain single-threaded Loader is more efficient for us.
Is it possible to configure which Loader to use?

Also, the constructor of ParallelLoader contains unnecessary (and harmful) synchronization on ParallelLoader class. It is better to remove it completely.

API is ok.
But, our current version in prod is 0.7.0. I've tried to upgrade to 0.19.0 before testing new API, and it looks like a parallel loader made the situation worse for us. (Probably, switching to new API will fix it, but anyway). Our services are deployed to Kubernetes and they have a small limit on processors - 2 per pod. I think that an old plain single-threaded Loader is more efficient for us.
Is it possible to configure which Loader to use?

I understand. I'll look into making it configurable. Maybe the single threaded one is a better default, too.

Also, the constructor of ParallelLoader contains unnecessary (and harmful) synchronization on ParallelLoader class. It is better to remove it completely.

You are right, I think that's a leftover piece of code from a first prototype when the service executor was static on the class. Good catch!

Hi @andreisobchuck,

I've committed the changes. Multi-threaded loader is opt-in now. By default, you should see the standard loader used.

Let me know if that works for you. Once everything is OK, I'll push a proper release.

Best
Slawo

PS: I'll add you to our gitter chatroom. If there's any questions, I'll gladly answer them there.

Deployed that build to the prod. Looks good, but I definitely can not guarantee that it's correct for all clients :)
After warm-up cpu load was reduced in comparison with the previous version. But time to compile an expression became 1/3 worse.
With v 0.0.7 100 compilation tooks ~7700ms.
With 0.20.0 100 compilations tooks ~10400ms.
100ms to compile 1 expression is ridiculous. But gain from cacheable expressions is obvious. Thanks.

Hi @andreisobchuck,

Thanks for your comments. I'll try to help with some of the issues.

Performance of compilation is an issue that I'm looking into. Tweakflow is designed to compile once, evaluate often. Each compilation sets up a Runtime that contains everything you put into it. For example, if you put the standard library in your prologue, the entire standard library file is parsed, and all declared functions are found, initialized and bound. You can Improve performance quite a bit if you provide your own limited size 'std' that is tailored to your needs.

Another thing you can do is cache the parse tree of modules. That's most useful for std and any other constant modules you might have. To enable it, just pass a dedicated caching map into your load path:

https://github.com/twineworks/tweakflow/blob/features/dup-runtime/src/main/java/com/twineworks/tweakflow/lang/load/loadpath/LoadPath.java#L64

I am looking into allowing Runtimes to share memory for certain modules - like std - so they don't have to be re-initialized every time.

Another thing that hits performance is that helper classes like the var table class parse individual expressions first, and only construct the module when all given expressions parse successfully as valid expressions. This is to enable better error handling and prevent code injection: i.e. people could inject arbitrary text (like additional libraries) into the generated module if there was only one pass at parsing.

Creating a helper class that fine tunes such behavior would help performance too. I think there is no inherent harm in letting people code-inject, since they could only create a new lib, so you could opt to put the module together for the table and parse once. Only when that fails would you go back to parsing individual expressions one by one, so you can tell which variable definition is the problem

Some advanced syntax may have been added or changed since 0.0.7. I think at some point I introduced mandatory commas and semicolons in literals and definitions. This is not strictly necessary for valid expressions, but it's impossible for the parser to produce meaningful error messages if it does not have such stoppers as helpers. Hence the change.

Compilation has become more expensive since I added additional paths into the grammar that match common mistakes, like forgotten commas, and produce more helpful error messages.

Catch me on gitter if you have questions.

A proper release including the changes is coming soon.

Best,
Slawo

After warm-up cpu load was reduced in comparison with the previous version. But time to compile an expression became 1/3 worse.
With v 0.0.7 100 compilation tooks ~7700ms.
With 0.20.0 100 compilations tooks ~10400ms.
100ms to compile 1 expression is ridiculous. But gain from cacheable expressions is obvious.

I take it you created some sort of test file to measure this. If you could share a runnable example as a gist, I could look into it and see if we can speed things up.

That was not a proper test but quick ad-hoc test to verify an assumption:
https://gist.github.com/andreisobchuck/ccb85184f4149892c407dfecf73993fa

Slawo, thank you for the great project. Your responses are very very fast and very useful. I appreciate it a lot.

Thanks @andreisobchuck

I've looked at the gist and ran it on my machine.

As is: 100 compilations after warmup takes ~8150ms. That's similar to your findings.
I've enabled the parse cache for std: 100 compilations after warmup takes ~540ms.

I will expose the ability to use the parse cache through the VarTable class, so you can use it directly. It will be in the next release.

Best
Slawo

Thank you Slawo. I have got more than expected.
Will the parse cache be shared between expressions to reduce memory usage?

Yes, the parse cache is shared.

The loader uses the cache to store parse trees, and it can skip parsing a module if a parse tree already exists in the cache. It hits the cache for 'std' and any other static modules you might include in your prologue.

Whether or not parse trees are cached is determined by the load location. The resource location where 'std' lives is by default cachable. The in-memory location that the VarTable class creates is not. You could allow caching there as well, but then you'd have to make sure that distinct expressions have distinct module names, because the normalized module name is the key of the caching table.

I'll consolidate the changes later today and provide a sample.

Changes are released as part of v0.20.0

API usage demonstrated in test

Maven central usually picks up the newly released jar within 24 hours. Until then you can use the jar attached to the release.

Thanks for opening the issue and helping make Tweakflow better!

Best,
Slawo