erthink/t1ha

ARM64 AES acceleration?

snej opened this issue · 4 comments

snej commented

Do you plan to add support for ARM64 AES instructions, as you did for x86?

Ideally, this would be nice, but I'm not going to do it yet:

  1. AESNI-accelerated variants of t1ha0() ("Just Only Faster", but not portable/stable) are designed specifically for x86, so inside there are two significantly different implementations for different CPU families. Therefore, to get the appropriate performance on ARM64, I need to create a new implementatio, but not try to port one of the x86 ones.
  2. It is quite difficult to design an implementation of a pretty fast hardware-accelerated hash function for ARM64, since:
    • ARMv8.x have a lot of optional features which are useful to the t1ha0() implementation (crc, crypto, simd, sve, sve2, aes, sha2, sha3, sm3/sm4, sve2+sm4/aes/sha3), but may have different performance depending on the CPU model.
      Moveover, a particular implementation using AES or SHA2 acceleration may be significantly faster than portable t1ha2() on the one CPU model, but significantly slower on another;
    • ARMv8.x haven't any common/generalized method for determining the availability of optional features at compile time and/or at runtime. On the contrary, it is required to enable these features by compiler-depend command line options, use compiler-dened macros and (in some cases) re-check ones availability at runtime by probe and SIGILL handler;
    • Therefore, for a good result, I should develop a set of functions for different (most popular and/or promising) ARM64 families, taking into account their capabilities, while having access to the corresponding set of hardware (i.e. reasonable subset of cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, ares, exynos-m1, falkor, neoverse-n1, neoverse-v1, neoverse-e1, qdf24xx, saphira, thunderx, vulcan, xgene1, xgene2, M1, etc).
  3. This is a big and interesting job. But I don't have any projects (or any other work) related to ARM64 right now. So I don't have the time or other resources to do this.
    Moreover, for now I don't have a "vision" of ARM64 market to make decisions for optimal/reasonable choice a set of ARM64 family/vendor/features as a baseline targets.

Related to #42

@snej, Please do not close this issue, it will be more useful to leave it open as a FAQ.

snej commented

Thanks for the explanation!

I'm looking at this primarily for iOS and Android, and those are probably the biggest use cases overall. (In the embedded world, ARM CPUs are almost always 32-bit Cortex.)

I am guessing that all Apple ARM CPUs have AES instructions since iOS relies heavily on file encryption, so iOS (and ARM Mac) support wouldn't require too many #ifdefs. Android of course is another story.