0xADE1A1DE/CryptOpt

Feature request: output C with intrinsics

andres-erbsen opened this issue · 4 comments

It'd be neat if we could have CryptOpt do optimization on assembly as it does, but then print the scheduled and register-allocated code as C to see how much worse C compilers do when given the easiest possible task. The code generated in this manner might also be preferrable to raw fiat-crypto output in cases a C implementation needs to be deployed. I don't think we should make any effort to use standard intrinsics, rather let's do something like fiat-crypto where we just define C functions that compute the same values as supported assembly instructions.

I'm not sure how register-allocated code as C would look like. Would I want to create C symbols like rax_1 ? But then the compiler can register allocate anything to then, right?
Another tricky point would be how to deal with flags, as I'm not aware of C-semantics for overflow/carry flags.

The way it currently works it that I preprocess any input (in this case from the fiat-bridge). Then this becomes my internal representation which in combination with a RegisterAllocator can be written to an assembly string.

In other words, there is currently no interface from my IR to 'write to x86-64'.
This ties in with the #143 and possible other targets, to define some interface, from which then, given some capabilities, a string representation (Go, C-intrinsics, x86-64, arm, ...) can be emitted.

Yes, I was imagining C variables for both registers and flags, and custom C prototypes I would implement to emulate various flag-manipulating instructions. I don't think we need any capability-selection logic for this, just a different stringification procedure. I might be missing something about the post-regalloc passes in CryptOpt though, I have only barely looked at the code.

The instruction selection and register allocation go hand-in-hand. That is, the selected instruction(s) depend on the current register allocation. C-Types could not have any way to specify mov's or spills (neither for flags not for 64-bit values), right?
Then, there would be some prototypes for addcarryx, subborrowx, mulx and cmovznzjust as in the Fiat-C, and variables are C symbols.

Is then the only thing left the order of function calls that would differentiate this output from the Fiat-C?
That would then be an only moderately complex addition.
Then CryptOpt has two ways of measuring the code, either by using CC (with various flags) and the C-output, or still use its own register allocation and AssemblyLine. The latter would not necessarily represent the code that a CC would then produce, hence (order) mutations would (probably) not be guided correctly (decision mutations don't make sense anymore either). And with the former (1) it may take too long to invoke everything and run it such that timely evaluation of thousands of mutations would become infeasible, but need to be tested (2) One might as well do it with a different project. And with that I mean a script which changes the order of the intrinsic calls (obeying data flow) and repeatedly compile + measure. Would be interesting how much more performance one could get from that.

After thinking about this now, It either feels like the idea is orthogonal to the capabilities of CryptOpt, Or I am missing the power of C to specify exactly which instructions to use. If we'd then opt to use intel intrinsics, even looking at e.g. the adx intrinsic its just one for both and the compiler seems to be free to use either.

Yeah, I think I am coming around to thinking that maybe having this functionality be a part of cryptopt is not that good of an idea after all, for the reasons you discuss. I even looked up the register annotation in C and turns out it is incompatible with intrinsics whose outputs are passed by pointer.

Perhaps I'll instead try a sed script that translates every opcode in an assembly file to a similarly-named macro in C... it's not going to be pretty with sub-register access though, but maybe I'll get somewhere without supporting that.