trussed-dev/trussed

Performance of `ServiceResources` in request handling

sosthene-nitrokey opened this issue · 7 comments

When an app has backends configured that deal with core requests, they will "see" each core API request.

But most backends get all the "resources" from ServiceResources, but loading the keystore necessitates forking the RNG (because the keystore needs an RNG for KeyId generation. The same is true for the cerstore and the counterstore.

However many requests will never touch any of these stores and those that do will probably only ever touch one.

Since generating RNG is expensive (on embedded harware), if each intermediary backend that sees the requests forks the RNG that way it might add up to a significant overhead.
For example in opcard, which has the RSA backend enabled, each requests to the core API ends up forking the RNG at least 5 times:

And then some requests fork it once more through self.rng.

This seems unnecessary and could be one of the reason of the slowness of opcard.

I see two way to reduce the number of forking of the RNG:

  • Avoid using self.rng() (I'm guilty of this in backends) and instead use the already-forked RNG from the keystore like the crypto mechanisms do
  • Use lazily intialize the keystore, certstore and counterstore only for requests that need it.

Ideally we could also totally avoid forking the RNG for requests that only need to read data from the keystore and don't need to generate KeyIds.

We could also store the stores initialized stores in the ClientContext, including the forked RNG. This would also make the accessible to other backends without having to copy the path again.

Use lazily intialize the keystore, certstore and counterstore only for requests that need it.

Generally, I’m not a big fan of the huge reply_to function. It also introduces a significant stack burden. I would split it into one function per (non-trivial) API call. Then we could also move the store instantiation to those functions that really need it.

This would also make the accessible to other backends without having to copy the path again.

Do we even need owned stores? Couldn’t we just use references?

In the backends I always take a mutable reference to the stores.

Implementing lazy initialization in the se050 backend and trussed saved 30ms of opgpcard status, only a 2% improvement. I expected more.

Not sure whether it's worth it.

It also introduces a significant stack burden

I don't understand why that is the case. Should the compiler be able to reuse the same stack space across incompatible branches?

This would also make the accessible to other backends without having to copy the path again.

Do we even need owned stores? Couldn’t we just use references?

I am not sure what you mean by that. The stores are already &mut everywhere in trussed.

Example implementation for lazily creating the keystore: trussed-dev/trussed-rsa-backend@3fbd2be and 99c2700

I don't understand why that is the case. Should the compiler be able to reuse the same stack space across incompatible branches?

You would think so, but apparently it is not the case. On the lpc55, the stack size is ca. 20 kB.

I am not sure what you mean by that. The stores are already &mut everywhere in trussed.

I mean to replace ClientKeystore { path: PathBuf } with ClientKeystore<'a> { path: &'a Path }, in case you worry about the path copies.

I believe that would work, but I'm not sure it's the most costly part.

It would also be incompatible with the idea to store them in the ClientContex.