str4d/fpe

Hm, Looks like the encryption process might be overtly slow.

isaacdre opened this issue · 3 comments

FF1 Should be lightening fast,
However when benchmarking, I was seeing this when speed testing (note timing was restricted to the most intensive step: let cb = ff.encrypt(&[], &binarified_input); )

40 Characters: 7.4697ms

200 Characters: 59.0451ms

1000 Characters: 623.9559ms

5000 Characters: 7.009278s

As the size of input becomes larger, the amount of time it takes to encrypt starts to rapidly increase. It seems inordinately slow for the relatively small sizes in play. I would expect some > 1 second encryption times in the megabyte size of strings, but not with just a few thousand characters.

Am I missing something on the nature of the FF1 algorithm? Is there a way I can turbo charge this?

I tried to use instead but that failed with an assert_eq! error, so I'm not sure if that has been fully implemented.

Thanks!

I've upgraded deps of binary-ff1 and it's much faster now.

However, I still wouldn't expect FPE to be competitive with regular block-ciphers. Its length-preserving property is meant for super short fixed-width inputs, like single integers.

str4d commented

Yeah, my impl here had a focus on correctness rather than speed, so it uses the simple approach of a word per symbol for all implementations, including binary. This results in inefficient packing (arrays are 8 times larger because they use a byte to store a bit).

To write a fast binary implementation you need scratch space, because the way FF1 works you might need to split inside a byte boundary. This is hard to fit into a generic trait-based API like this crate uses, without const generics (or going down the generic-array rabbit hole, which I'd rather not do).

That being said, I have occasionally spent time working towards enabling faster binary implementations, and would like to improve speed where possible. It's just not a priority for my use case (handling short 88-bit values).

str4d commented

I finally found time to sit down and finish my performance refactor of BinaryNumeralString, and it is now significantly faster (benchmarked vs b9bb4a8 on Ryzen 9 5950X):

image

I'll clean up the changes and make a PR soon.

There is probably still more that could be done to avoid allocations, but it would likely require significantly altering the internals, and I prefer keeping the implementation close to the specification if possible.