Shares of even gcm powers can be computed locally.
Opened this issue · 11 comments
Let H be ghash's encrypted zero.
I noticed that if two parties have XOR shares of H^2 = H_a xor H_b, then
H_a*H_a xor H_b*H_b = H^4
H_a*H_a*H_a*H_a xor H_b*H_b*H_b*H_b = H^8 ... etc for powers 2*2, 2*4, 2*8...
The same can be done if parties start out with shares of H^3, H^5 etc
At the end both parties will have shares of locally computed even powers.
Having to compute only odd powers in MPC, gives a nice 50% reduction.
Let me think about it and double check.
What I am not sure is why XOR
can work together with multiplying here, as the GHASH is computed on GF(2^{128}) with a polynomial x^{128}+x^{7}+x^{2}+x+1}x^{128}+x^{7}+x^{2}+x+1. Multiplying is complicated.
Basically, if H^2 = H_a XOR H_b, we can say indeed H^2 = H_a + H_b in GF(2^{128})
So, H^4 = H_a^2 + H_b^2 + 2H_aH_b, which has a cross term.
So, my first impression is that H^4 cannot be simply computed from the "squared" XOR shares of H^2. Let me see if we can find some examples.
Ah you are right, 2H_aH_b = 0 as always.
So this clearly is good enough for the semi-honest case. (Are you using it in the semi-honest case?)
I will see whether it has an influence on the malicious case.
My use case is semi-honest.
I just stumbled upon this trick, so I'm interested to know why it works. Thanks for your explanation. I tried to run H_a^2 + H_b^2 + 2H_aH_b
, with real numbers, however
2H_aH_b
does not produce 0 for me.
I'm doing block multiplication of 2 by H_a and then block-multiply the result by H_b.
Does the result check out for you with real numbers?
Let me play with the Sage scripts a little bit to confirm it. But often in GF(2^{128}), "XOR" is the same as "+", so
2H_aH_b = H_aH_b + H_aH_b = H_aH_b XOR H_aH_b = 0
Note that GF(2^{128}) is not a prime field with modulus 2^{128}. Rather, it is an extension field of prime 2.
Ah, that makes perfect sense when written out like that.
So, all even powers can be computed for free then. I'll fix my initial post.
There is an optimization with which we need only 17 block multiplications in MPC in order to compute the whole ghash sequence for the longest possible TLS record size of 16KB.
After the parties started out with shares of H^1 and computed other shares locally using the above mentioned free squaring: H2, H4, H8, H16 etc, they proceed to compute H3X3 + H5X5, where H3/5 is H^3/5 and X3/5 are corresponding ciphertext blocks.
Expanding:
(H2_a+H2_b)(H_a+H_b)X3 = H2_aH_aX3 + H2_aH_bX3 + H2_bH_aX3 + H2_bH_bX3
(H4_a+H4_b)(H_a+H_b)X5 = H4_aH_aX5 + H4_aH_bX5 + H4_bH_aX5 + H4_bH_bX5
Let K*_a be all values which Alice can compute locally
Let K*_b be all values which Bob can compute locally
We'll rewrite the above:
K1_a + K2_aH_b + K2_bH_a + K1_b
K3_a + K4_aH_b + K4_bH_a + K3_b
The parties jointly need to compute only the middle cross-terms:
K2_aH_b + K2_bH_a + K4_aH_b + K4_bH_a = (K2_a+K4_a)H_b + (K2_b+K4_b)H_a
They will compute it in MPC in this way
Alice inputs her (K2_a+K4_a) and her H_a
Bob inputs his (K2_b+K4_b) and his H_b
The circuit computes C = (K2_a+K4_a)H_b + (K2_b+K4_b)H_a
The circuit sends random xor shares of C back to Alice and Bob
The parties could have computed H3X3 + H5X5 + H9X9 + H17X17 in the same MPC circuit with only 2 block multiplication.
Having run the numbers, I see that the best strategy is for the parties to first compute (in MPC) shares of powers 3,5,7,9,11 ..., then apply "free squaring" to those shares and then run the method described above in this post.
6 block multiplications in MPC will give us 239 powers of H.
17 block multiplications in MPC will give us 1025 powers of H.
This is a very novel observation. Let me calculate how many {H^?} need to be precomputed, and then how many {H^? X^?} are needed...
Just wanted to make a note here that the technique we discussed here was implemented in the latest version of TLSNotary and described here https://tlsnotary.org/how_it_works#section4
If this technique is combined with Oblivious Transfer then GCM can be computed without garbled circuits. And it can also be extended to the MPC setting.
Also, to correct my previous statement - the amount of block multiplications needed in 2PC setting to compute 1026 powers of H with the technique and optimizations described in the link above is ~100. (I haven't run the numbers for the MPC setting yet). But then again, those block multiplication can be done with OT which is fairly cheap.
Sorry for the late reply. I guess when I come back to this project one day, I may need to use a lot of your code.
I want to credit you. Let me know what would be the best way: themighty1? Dan from TlsNotary? My email address is w.k@berkeley.edu