Remaining Curve25519 issues

Question

Remaining Curve25519 issues

Closed this issue 6 months ago · 4 comments

Curve25519 is now at 4.1.2, with hardware acceleration, but there are some outstanding issues with regards to hardware locking.

At the moment, each process that uses the accelerator has to manually release their resources. See betrusted-io/curve25519-dalek#2 for a longer discussion of the history.

The action items are:

~~implement release of hardware resources on Drop~~
Benchmark hardware operations with manual dropping to baseline current code base
Implement release of hardware lock after every operation, and run benchmarks again
implement fallback path so that if hardware is not available, it seamlessly falls back to a software implementation
implement error detection of suspend during operation, so that operations are automatically re-tried if the system was put into suspend during a hardware accelerated operation.

I think actually a lot of this needs to be done inside our fork of the Curve25519 repo, so perhaps the issue rightfully belongs there, but I never look there to track issues so I'm opening it in the xous-core repo to reduce the chances of me forgetting to do this.

Answer 1 · 2024-03-25T04:53:17.000Z

Ah. OK, the first problem has raised its head.

You can't implement Drop on types that implement Copy, but Copy is necessary for some of the constant-time cryptographic traits it seems. So, the idea of auto-dropping and releasing hardware might not work.

Answer 2 · 2024-03-25T05:49:14.000Z

I think for the Drop issue, what I may do is just make the routine release the engine hardware after each call, and re-allocate it. I'll run benchmarks to see how much of a penalty this is -- I suspect it may be surprisingly small given that the memory mappings are all static and use the "happy path" inside the kernel when requested.

Answer 3 · 2024-03-28T08:28:24.000Z

Going with the default free of the hardware and re-grab on every bignum op. Benchmarks are as follows:

Diffie-Helman:

8.37ms/2xop (200 iters - hw) - with new curve25519 lib and engine retained after every loop
33.04ms/2xop (200 iters - hw) - with new curve25519 lib and auto-free engine after every loop

Low level checks:

53.6ms/check vector iteration (10 iters total, 1450 vectors total) with engine retained
56.5ms/check with auto-free

There is almost no impact on the low level checks. There is a fairly substantial impact on the diffie hellman exchange, but, the actual wall-clock time is still acceptable (30ms per pair of DH operations). If it turns out that we need to hyper-optimize diffie-hellman to run faster, we can revisit the hardware locking, but the "dumb but simple" method of just releasing after every curve operation and re-acquiring it I think gets us going and good enough.

Answer 4 · 2024-03-28T08:29:09.000Z

@kotval I think #518 should clean up all the remaining issues, let me know if you run into any problems!