tcti-spi-helper is slow
Akuli opened this issue · 2 comments
I am using a simple benchmark where I get a lot of random data by calling Esys_GetRandom()
in a loop, measure how long it took, and then calculate how many random bytes per second I received on average.
This sleep is the bottle neck:
tpm2-tss/src/tss2-tcti/tcti-spi-helper.c
Lines 387 to 388 in 86949f7
I tried changing 8ms to other values:
sleep time | Esys_GetRandom() speed |
---|---|
8ms | 1972 bytes/sec |
5ms | 3129 bytes/sec |
1ms | 9858 bytes/sec |
0.5ms(*) | 13332 bytes/sec |
0ms | 17109 bytes/sec |
So deleting this sleep makes things about 8.7x faster, at least on the system I am working on.
I don't know what the downsides of "spamming the TPM" are. Even if there aren't any, I think having a sleep or similar is a good idea, so that you can tell the OS to switch tasks. This is not necessary on e.g. Linux, of course, but it is necessary in the project I'm working on now.
(*) This would be difficult to support because sleep time is currently milliseconds as int
.
In principle, the reason for not spamming the TPM is that a TPM is usually a single-thread processor. Depending on the implementation (no, I am not speaking about any specific vendor), spamming the TPM with polls can delay the execution of the main task; esp if it is long running. Otherwise with a 0ms poll delay the execution of e.g. a TPM2_Sign operation could take a penalty of e.g. increasing it from 100ms to 120ms or worse. The is the theory behind the introduction of poll delays in the first place. If of course an implementation performs the SPI communication in a HW module or on a second core then there is not penalty by spin-polling. Unfortunately, the TSS does not know this.
Regarding the actual value, 8ms is intended as a balance between short enough intervals on quick commands to get a result vs long enough intervals to not have a bad effect on long commands.
A possible alternative would be to e.g. use 1ms for the first 5 rounds, then 5ms for another 5 rounds, then 10ms for the reset. This IMHO could be a sound scheme, but we'd have to look into details.
1ms for the first 5 rounds, then 5ms for another 5 rounds, then 10ms
I ended up doing something similar. Turns out that I only needed two rounds of 1ms sleeping to make my benchmark as fast as it can be with 1ms.