Evaluate sharing of C code between mlkem-c-{generic, aarch64, embedded}

Question

Evaluate sharing of C code between mlkem-c-{generic, aarch64, embedded}

mkannwischer opened this issue 4 months ago · 0 comments

Right now we do have 3 copies of mostly the same C code:

In the long run, we want to share as much as possible between these C projects (and possibly additional ones). Maybe we even want to combine them into a single project at some point.

The reason for starting off with separate code bases was the different goals of the projects: In particular, the generic project wants to stay in sync with the Kyber teams reference implementation and they want to make as few changes as possible. That's why we started off our own copy and postponed the discussion of code sharing to a later point. (Since nothing has been merged into the generic project so far, I believe this was a good approach).

Code sharing between embedded and aarch64 should be easier, and we should discuss what is the best way to implement it.
Should we pull out the common code to a separate repository (e.g., mlkem-c-common) and include it into both embedded and aarch64 as a submodule? At the same time we could put in place the the checks for undefined behavious (using e.g., CBMC).

I have two thoughts about what we definitely need to consider:

For AArch64, we likely want to support multiple levels of batching Keccak permutations to make best use of hybrid scalar-vector implementations (see #33, https://kannwischer.eu/papers/2022_armv8keccak.pdf). This is something we don't want/need for the embedded implementation. For now we decided to start off with a 4x Keccak API for AArch64 - you definitely don't want to allocate 4 Keccak states for the embedded implementation though.
For the embedded implementation, we do implement some stack optimizations that do change some C code significantly. For example, to avoid having to store another ciphertext in re-encryption, we have a variant of the encryption functions that compare to a reference ciphertext in-line. While this is neat when you care about stack usage, it unnecessarily complicates the code if you don't.