Remaining Curve25519 issues
Closed this issue · 4 comments
Curve25519 is now at 4.1.2, with hardware acceleration, but there are some outstanding issues with regards to hardware locking.
At the moment, each process that uses the accelerator has to manually release their resources. See betrusted-io/curve25519-dalek#2 for a longer discussion of the history.
The action items are:
-
implement release of hardware resources on Drop - Benchmark hardware operations with manual dropping to baseline current code base
- Implement release of hardware lock after every operation, and run benchmarks again
- implement fallback path so that if hardware is not available, it seamlessly falls back to a software implementation
- implement error detection of suspend during operation, so that operations are automatically re-tried if the system was put into suspend during a hardware accelerated operation.
I think actually a lot of this needs to be done inside our fork of the Curve25519 repo, so perhaps the issue rightfully belongs there, but I never look there to track issues so I'm opening it in the xous-core repo to reduce the chances of me forgetting to do this.
Ah. OK, the first problem has raised its head.
You can't implement Drop
on types that implement Copy
, but Copy
is necessary for some of the constant-time cryptographic traits it seems. So, the idea of auto-dropping and releasing hardware might not work.
I think for the Drop
issue, what I may do is just make the routine release the engine hardware after each call, and re-allocate it. I'll run benchmarks to see how much of a penalty this is -- I suspect it may be surprisingly small given that the memory mappings are all static and use the "happy path" inside the kernel when requested.
Going with the default free of the hardware and re-grab on every bignum op. Benchmarks are as follows:
Diffie-Helman:
- 8.37ms/2xop (200 iters - hw) - with new curve25519 lib and engine retained after every loop
- 33.04ms/2xop (200 iters - hw) - with new curve25519 lib and auto-free engine after every loop
Low level checks:
- 53.6ms/check vector iteration (10 iters total, 1450 vectors total) with engine retained
- 56.5ms/check with auto-free
There is almost no impact on the low level checks. There is a fairly substantial impact on the diffie hellman exchange, but, the actual wall-clock time is still acceptable (30ms per pair of DH operations). If it turns out that we need to hyper-optimize diffie-hellman to run faster, we can revisit the hardware locking, but the "dumb but simple" method of just releasing after every curve operation and re-acquiring it I think gets us going and good enough.