jschanck/ntru

NEON ARMv8 Implementation

Closed this issue · 2 comments

Hi,

So I have NEON implementation of NTRU, the work mostly spent on polynomial multiplication. Here are the benchmark results:

NTRU M1 REF ref HPS509 ref HPS677 ref HRSS701 ref HPS821
crypto_kem_keypair 3,501,828 6,219,493 6,578,665 9,056,209
crypto_kem_enc 103,059 182,986 152,390 244,308
crypto_kem_dec 231,254 429,798 439,859 583,865
poly_Rq_mul 70,576 134,764 133,792 185,126
poly_S3_mul 72,711 137,327 136,482 188,243
sample_fixed_type 27,808 42,299 3,422 53,976
poly_lift 115 176 12,193 165
poly_Rq_to_S3 2,209 2,836 2,936 3,504
poly_Rq_sum_zero_tobytes 449 600 620 159
poly_Rq_sum_zero_frombytes 1,189 1,567 1,797 1,119
poly_S3_tobytes 328 433 447 523
poly_S3_frombytes 2,299 2,978 3,086 3,653
NTRU M1 NEON neon HPS509 neon HPS677 neon HRSS701 neon HPS821
crypto_kem_keypair 2,684,977 4,715,457 5,032,358 6,993,689
crypto_kem_enc 39,158 60,198 23,145 75,630
crypto_kem_dec 33,024 53,708 60,644 69,213
poly_Rq_mul 7,347 11,595 15,689 17,241
poly_S3_mul 7,509 11,876 15,660 17,432
sample_fixed_type 27,853 42,254 3,425 54,036
poly_lift 65 89 1,074 105
poly_Rq_to_S3 279 383 383 452
poly_Rq_sum_zero_tobytes 449 600 620 159
poly_Rq_sum_zero_frombytes 1,189 1,566 1,797 1,119
poly_S3_tobytes 329 435 447 523
poly_S3_frombytes 360 501 509 597

Edit: The unit is clock cycles.

I think the number speaks for itself.
Do you want me to pull this implementation to NTRU code base ?
If so, please let me know what to do.

Hi @cothan, looks like you've done some really nice work here!

I think that PQClean and liboqs might start taking ARM code sometime in the near future. The best way for you to get your code into wider use is to contribute directly to those projects if/when they do that. I have a script for packaging the code in this repo for PQClean (https://github.com/jschanck/package-pqclean/tree/main/ntru), which should be pretty easy to modify for your code.

Thank you. I can close this issue now.