ChainSafe/as-sha256

Possible micro-optimization

Closed this issue · 8 comments

Lodestar uses sha256 mostly in a single specific way, to merkleize data.
Because use is constrained to mainly this case, we may be able to provide a sha256 routine that is optimized for this merkle node input, ie: 64 byte input.

Investigate creating a separate exported wasm function, eg: digest64 which is optimized and only works for 64 byte input. This routine may precompute the second message block because the input is the exact length of the first message block and the length is known ahead of time.

Precomputing this second message block may save time at the cost of requiring the precomputed block to be included in the wasm blob. Care will be required to ensure the wasm blob stays under the 5kb limit required for synchronous loading.

See https://web.archive.org/web/20130526224224/http://csrc.nist.gov/groups/STM/cavp/documents/shs/sha256-384-512.pdf

@wemeetagain @dapplion could you assign this to me for investigation/analysis

@wemeetagain @dapplion
I analyzed the algorithm, there is no need to precompute message block (for length hashing), instead keep a precomputed W vector for the second message block (which get then mixed with previous block's hash output) and use it in blockhash fn, skipping the following computation for M(64) - the second message block encoding the length of 64 bytes .
image

I reckon 10-20% speedup gain overall on saving the W computations. if you feel this is desirable efficient gain. The code can also be kept short and sweet with a single if check while computing the W, or skipping if its a precomputed W, plus the digest64 signature plus a presaved W64 vector of 32bit words of length 64.

if above efficiency is desirable, i can code it up.

I recon 10-20% speedup gain overall on saving the W computations

That sounds good enough efficiency gain, around what the expected. I think it's worth to code it up. Please, make sure to add tests to guarantee correctness

if digest64 is called with >64 bytes, it will digest left 64 bytes, if is called with <64 bytes it will pad on right with 0s. this is good enough behavior?

if digest64 is called with >64 bytes, it will digest left 64 bytes, if is called with <64 bytes it will pad on right with 0s. this is good enough behavior?

There must be a dedicated digest64 function which accepts only inputs strictly of 64 bytes, otherwise throw an error

here are the benchmarking results -third row for digest64:

image

@g11in Looks great! Can you open a draft PR to take a look?

@dapplion opened for review.