Fast subgroup checks

Question

Fast subgroup checks

joebebel opened this issue 4 years ago · 8 comments

The protocol extensively sends/receives points on the BLS12-381 curve from third parties, there may be many subgroup checks needed to ensure that all such points lie in the prime order subgroup.

This task involves two parts:

Identify every place in the protocol where such subgroup checks are necessary
Implement a fast subgroup check algorithm, e.g. https://eprint.iacr.org/2019/814.pdf

Answer 1 · 2021-03-15T18:25:59.000Z

Additional resources:

zcash/zcash#3425 (comment)
zcash/zcash#3425 (comment)
zcash/zcash#3470

pairingwg/bls_standard#21

mratsim/constantine#47
mratsim/constantine#46
https://github.com/ethereum/EIPs/blob/master/EIPS/eip-2539.md
status-im/nimbus-eth2#1715

Answer 2 · 2021-03-16T17:48:45.000Z

Apparently Celo (https://github.com/celo-org/celo-blockchain/tree/master/crypto/bls12381) gets a big speed improvement from batching subgroup checks, its worth considering combining batching with the fast algorithm described in Sean Bowe's paper.

Answer 3 · 2021-03-17T22:05:47.000Z

Fast subgroup check from Bowe's eprint/2019/814 is more efficient than multiplying by the cofactor only on the G2 case (the G1 cofactor is small).

The G2 subgroup check is (partially) done in zkcrypto/bls12_381: the clear_cofactor function uses the Bowe's trick but is not used in the is_torsion_free function.

The arkworks-rs/curves implementation does not provide a is_torsion_free function.

I forked the zkcrypto/bis12_381 into heliaxdev/bls12_381 and implemented (as an exercise) the G1 subgroup check as in Bowe's paper.

Answer 4 · 2021-03-18T22:27:16.000Z

I think what's happening in the G1 case is that the implementation of multiply in bls12_381 is constant-time, therefore $[(z^2-1)/3] P$ costs exactly the same as $[q] P$ which would make the fast subgroup check actually slower. In order to actually take advantage of the fast test, the final multiply needs to only work on [u8; 16] instead of [u8; 32] otherwise it will continue to double the base point

Answer 5 · 2021-03-19T09:21:38.000Z

Actually the celo implementation I linked to is written in Go, and while it uses Bowe's method, it doesn't do the batching.

The batching is instead implemented in zexe (celo-org/zexe#4) and it seems like the performance speedup is substantial. So now we have yet another dependency issue to deal with, as zexe seemes like an arkworks fork?

Answer 6 · 2021-03-23T14:30:20.000Z

(as exercies) I have done the is_torsion_free_optimized functions for G1 and G2 using the Bowe's trick and the gain is significant as expected. See heliaxdev/bls12_381 commit de80c8ab4cd2ceb2b7b9026f2571546695eaeb26 and 6a8eb9f9c534bf035a407d17081fa2e313bb0e1d for details.

Answer 7 · 2021-04-13T23:42:26.000Z

Probably going to be integrated into arkworks anyway, so nothing probably required from our end right now.

Answer 8 · 2021-08-13T09:13:50.000Z

#58 (comment) provide benchmarks of the fast subgroup check for G1 and G2.