denoland/rusty_v8

API to create and re-use StartupData

zcourts opened this issue · 4 comments

I'm struggling to find the right API to do snapshots.
We support multiple "origins" in a similar way to how V8 does but because we want to track memory per tenant, we decided that we'll have one isolate and one context per tenant.

I want the same tenant's scripts to be compiled once and then reused so I thought to compile everything, take a snapshot and then each time the tenant runs their scripts, create a new isolate using the snapshot...I've gotten as far as creating the StartupData but then the current API makes it impossible from what I can see to create multiple isolates from the same startup data since the Allocated::of requires 'static any use of snapshot_blob must consume the startup data or the as_ref must have a `'static' lifetime...which isn't possible in our case.

Is this a rusty_v8 restriction or a V8 thing?
Roughly, my current process was:

        let handle_scope = &mut v8::HandleScope::new(&mut isolate);
        let context = v8::Context::new(handle_scope);
        let scope = &mut v8::ContextScope::new(handle_scope, context);
        let mut scope = v8::TryCatch::new(scope);
       //...
       v8::Script::compile(
           &mut scope,
           script,
           Some(&origin),
       )
      //... report_exceptions etc
                let mut snapshot_creator = v8::SnapshotCreator::new(None);
                snapshot_creator.set_default_context(context);
                let snapshot = match snapshot_creator.create_blob(FunctionCodeHandling::Keep) {...};
               guard.insert(service_id, snapshot);
     //guard's a Mutex on a HashMap<i64, StartupData>

later, service_id wants to execute one of the scripts compiled in the snapshot N times but

v8::Isolate::create_params().snapshot_blob(startup_data.as_ref())

requires 'static and

v8::Isolate::create_params().snapshot_blob(startup_data)

consumes startup_data.

Questions:

  1. Am I just using the wrong APIs?
  2. Can you create and re-use a snapshot?
  3. Maybe a snapshot isn't the right thing to be using, the goal is to compile the scripts once per tenant/service_id and later execute the compile scripts with each execution getting its own context...how do I achieve that with the rusty_v8 API?
  1. Am I just using the wrong APIs?

It's the correct API.

  1. Can you create and re-use a snapshot?

Yes, in Deno itself we use the same snapshot for main worker and multiple worker - you can see it defined here:
https://github.com/denoland/deno/blob/76df7d7c9bb7b6b552fd33efbedb28e21969d46c/cli/worker.rs#L609

I suggest to see https://github.com/denoland/deno_core for more details on how we use snapshot.

  1. Maybe a snapshot isn't the right thing to be using, the goal is to compile the scripts once per tenant/service_id and later execute the compile scripts with each execution getting its own context...how do I achieve that with the rusty_v8 API?

Most likely it's not the right approach. If the scripts are small you are gonna be paying a huge overhead for snapshots - snapshots effectively include the whole JS environment with all of JS builtins. So for each snapshot you're gonna be storing this duplicated data. Additionally - snapshots are only valid for particular version of V8 and particular arch. Once you bump rusty_v8 dependency you will have to regenerate all the snapshots. You want might to use v8::CacheData instead.

Ahhh...hmm. I may have gone off course after reading the V8 docs.
In https://v8.dev/docs/embed#contexts it says

In terms of CPU time and memory, it might seem an expensive operation to create a new execution context given the number of built-in objects that must be built. However, V8’s extensive caching ensures that, while the first context you create is somewhat expensive, subsequent contexts are much cheaper. This is because the first context needs to create the built-in objects and parse the built-in JavaScript code while subsequent contexts only have to create the built-in objects for their context. With the V8 snapshot feature (activated with build option snapshot=yes, which is the default) the time spent creating the first context will be highly optimized as a snapshot includes a serialized heap which contains already compiled code for the built-in JavaScript code. Along with garbage collection, V8’s extensive caching is also key to V8’s performance.

So I assumed the first context will be highly optimized as a snapshot was the same snapshot I saw in the rusty_v8 API.
My thinking was that taking the snapshot then running the same script repeatedly with it as a base would achieve the optimisation the V8 docs mention.

Another issue/point is that I'm not storing the snapshot long term. The plan was to keep it in-memory, we'd have at most a few hundred snapshots on one server at a time and they would be evicted every few hours, maybe half a day. It's okay for an occasional request to be slower whilst all the scripts are recompiled but subsequent invocations until the next cache eviction would be faster/optimised.

Maybe a bit of guidance if possible considering we're in a scenario where we have a set of scripts, A and B.

  • A is from one ten and and B is from another.
  • We can't have them changing JS globals and affecting each other
  • We want to track resource usage for A and B
  • Once A and B are deployed, they won't change for some time so we can cache them
  • Once deployed, A and B will be executed thousands of times per second
  • A and B can vary quite wildly in size as we're embedding V8 for our customers to use, it is anyone's guess what the min,avg, max will be

From what you've said, I'm thinking a flow that follows these steps:

    let source = Source::new(script, Some(&origin));
    //let source = Source::new_with_cached_data(); //later when executing the script again, we'd use this with the CachedData
    let compiled=v8::script_compiler::compile(&mut scope,source,v8::script_compiler::CompileOptions::ConsumeCodeCache,NoCacheReason::BecauseV8Extension).unwrap();
    let unbounded_script =
        compiled.get_unbound_script(&mut scope);
    let cache_scripts =
        unbounded_script.create_code_cache();

I could only find this regarding the ConsumeCodeCache so presuming that's right. Seems to conflict with NoCacheReason though...
I'm seeing deno_core does something similar

Appreciate I've dropped a lot of info. trying to provide you all you may need

So I assumed the first context will be highly optimized as a snapshot was the same snapshot I saw in the rusty_v8 API.
My thinking was that taking the snapshot then running the same script repeatedly with it as a base would achieve the optimisation the V8 docs mention.

Not really, from what I can tell you want to create a snapshot of the "base environment" - essentially built-in JS APIs plus any APIs you might want to provide to all tenants. Then you run from that snapshot and execute user code.

Another issue/point is that I'm not storing the snapshot long term. The plan was to keep it in-memory, we'd have at most a few hundred snapshots on one server at a time and they would be evicted every few hours, maybe half a day. It's okay for an occasional request to be slower whilst all the scripts are recompiled but subsequent invocations until the next cache eviction would be faster/optimised.

Yeah, feels to me that snapshots would be an overkill for that, and they're not very user-friendly to use. I would suggest approach with base snapshot.

Maybe a bit of guidance if possible considering we're in a scenario where we have a set of scripts, A and B.
A is from one ten and and B is from another.
We can't have them changing JS globals and affecting each other
We want to track resource usage for A and B
Once A and B are deployed, they won't change for some time so we can cache them
Once deployed, A and B will be executed thousands of times per second
A and B can vary quite wildly in size as we're embedding V8 for our customers to use, it is anyone's guess what the min,avg, max will be

If you are running untrusted code from multiple tenants the very bare minimum you should do is create a separate Isolate instance for each tenant. Better yet, run them in separate processes. While V8 is a good sandbox, it's not bulletproof and multiple levels of security should be applied. You will also find that it's way easier to think in terms of isolates, than trying to juggle multiple snapshots and contexts.

If you are fine with the first request being a bit slower then creating a new isolate and executing tenants code will be enough for you - the baseline overhead that we measured in deno_core for an empty Isolate is about 4ms (ie. creating a process, creating isolate from a snapshot, running empty file, doing cleanup and exiting process). In deno itself it's about 15ms. If you are fine with such latency then I strongly recommend this approach.

You can then add the CodeCache into the mix and cache its output to make it cheaper to create new isolate with the same tenant code.

I could only find this regarding the ConsumeCodeCache so presuming that's right. Seems to conflict with NoCacheReason though...
I'm seeing deno_core does something similar

Yes, you can copy the deno_core approach for easier integration.

I appreciate you taking the time to feedback 4ms should be fine and yes it is untrusted code. I'll go the isolate per tenant route. It was a bit of a blunder jumping the gun to go for what I thought was the more optimised way of doing it before benchmarking, rookie mistake!

Appreciate your time and guidance on the matter!