NEON ARMv8 Implementation
Closed this issue · 2 comments
Hi,
So I have NEON implementation of NTRU, the work mostly spent on polynomial multiplication. Here are the benchmark results:
NTRU M1 REF | ref HPS509 | ref HPS677 | ref HRSS701 | ref HPS821 |
---|---|---|---|---|
crypto_kem_keypair | 3,501,828 | 6,219,493 | 6,578,665 | 9,056,209 |
crypto_kem_enc | 103,059 | 182,986 | 152,390 | 244,308 |
crypto_kem_dec | 231,254 | 429,798 | 439,859 | 583,865 |
poly_Rq_mul | 70,576 | 134,764 | 133,792 | 185,126 |
poly_S3_mul | 72,711 | 137,327 | 136,482 | 188,243 |
sample_fixed_type | 27,808 | 42,299 | 3,422 | 53,976 |
poly_lift | 115 | 176 | 12,193 | 165 |
poly_Rq_to_S3 | 2,209 | 2,836 | 2,936 | 3,504 |
poly_Rq_sum_zero_tobytes | 449 | 600 | 620 | 159 |
poly_Rq_sum_zero_frombytes | 1,189 | 1,567 | 1,797 | 1,119 |
poly_S3_tobytes | 328 | 433 | 447 | 523 |
poly_S3_frombytes | 2,299 | 2,978 | 3,086 | 3,653 |
NTRU M1 NEON | neon HPS509 | neon HPS677 | neon HRSS701 | neon HPS821 |
---|---|---|---|---|
crypto_kem_keypair | 2,684,977 | 4,715,457 | 5,032,358 | 6,993,689 |
crypto_kem_enc | 39,158 | 60,198 | 23,145 | 75,630 |
crypto_kem_dec | 33,024 | 53,708 | 60,644 | 69,213 |
poly_Rq_mul | 7,347 | 11,595 | 15,689 | 17,241 |
poly_S3_mul | 7,509 | 11,876 | 15,660 | 17,432 |
sample_fixed_type | 27,853 | 42,254 | 3,425 | 54,036 |
poly_lift | 65 | 89 | 1,074 | 105 |
poly_Rq_to_S3 | 279 | 383 | 383 | 452 |
poly_Rq_sum_zero_tobytes | 449 | 600 | 620 | 159 |
poly_Rq_sum_zero_frombytes | 1,189 | 1,566 | 1,797 | 1,119 |
poly_S3_tobytes | 329 | 435 | 447 | 523 |
poly_S3_frombytes | 360 | 501 | 509 | 597 |
Edit: The unit is clock cycles.
I think the number speaks for itself.
Do you want me to pull this implementation to NTRU code base ?
If so, please let me know what to do.
Hi @cothan, looks like you've done some really nice work here!
I think that PQClean and liboqs might start taking ARM code sometime in the near future. The best way for you to get your code into wider use is to contribute directly to those projects if/when they do that. I have a script for packaging the code in this repo for PQClean (https://github.com/jschanck/package-pqclean/tree/main/ntru), which should be pretty easy to modify for your code.
Thank you. I can close this issue now.