Implement Jarvis and Friday
HarryR opened this issue · 15 comments
From:
- https://www.esat.kuleuven.be/cosic/jarvis-and-friday-stark-friendly-cryptographic-primitives/
- https://eprint.iacr.org/2018/1098.pdf
This ticket includes the following sub-tasks:
- Analyse MiMC, Jarvis & Friday papers
- Verify if the Python pseudocode below implements it faithfully
- Wait for further analysis from somebody qualified to analyse security, or further papers
- Implement Python version of Jarvis and Friday, add unit tests
- Implement libsnark circuit which matches Python implementation
- Implement Solidity functions which match Python and libsnark implementations
Jarvis is an improvement upon MiMC, it hopefully attains the same level of security in a way which costs much less to verify in a zkSNARK circuit.
MiMC-p/p (sometimes called LongsightL) is has been implemented and included in Ethsnarks in combination with the Preneel one-way compression function:
- https://github.com/HarryR/ethsnarks/blob/master/ethsnarks/longsight.py
- https://github.com/HarryR/ethsnarks/blob/master/src/gadgets/onewayfunction.hpp
- https://github.com/HarryR/ethsnarks/blob/master/src/gadgets/longsightl.cpp
- https://github.com/HarryR/ethsnarks/blob/master/contracts/LongsightL.sol
This allows for very cheap merkle-trees and hashing, in a way which is 'field-native' - that is, it uses field elements rather than bits to achieve hashing and encryption of many bits per operation rather than many operations per bit.
However, it would be good to implement Jarvis as an alternative to LongsightL and MiMC-p/p, as this could reduce the number of constraints required for large Merkle trees, and significantly proving time.
The difference between MiMC and Jarvis is that the modulo inverse is used to increase the degree of the polynomial to excessive numbers, rather than relying on a specified number of rounds to increase the degree to an acceptable number.
The Preneel one-way compression function is already implemented, and used in conjunction with MiMC in a very similar manner to Jarvis and Friday, however Jarvis is more complex to compute on EVM (requiring modulo inverse at every round via the modexp
instruction), but relatively simple to verify (in the same way that constraints are verified).
The following are my notes on a possible implementation:
Making all these changes, the Jarvis round function becomes:
Where the key expansion function is:
The key derivation function being:
def jarvis_keys(k, c):
# key is master key
# c is list of round constants
for c_i in c:
k = (1/k) + c_i
yield k # emit sequence of derived keys
The Jarvis function would be:
def jarvis(k, m, c):
# m is a message
# k is master key
# c is list of round constants, length is the number of rounds
s_i = m + k
for k_i in jarvis_keys(k, c):
s_i = (1/s_i) + k_i
return s_i
The Friday function is
def friday(k, messages, c):
for m in messages:
k = k + m + jarvis(k, m, c)
return k
Without accounting for x = 0
, this can be enforced with one constraint per round:
a = s_i
b = 1/a + k_i
assert (a * (b - k_i)) == 1
While accounting for x=0
and x!=0
, this can be enforced with 3 constraints per round:
# r = f(x); uses linear combination to add round key?
assert (1 - b) * b == 0
assert (1 - b) * x == 0
assert r * (x + b - 1) == b
What are your thoughts on https://guidebook.com/guide/117233/event/21956184/
I can't download the slides for it.
The video should be out soon.
https://eprint.iacr.org/2018/1098.pdf
It seems the A
(and subsequently B
and C
functions) are used to prevent an invariant subfield attack.
§3 pg 5: Concerning the security of the block cipher, the important features for this affine layer are that its algebraic complexity should be high enough, i.e., the affine polynomial and its inverse need to be of high degree, dense and such that not all the
c_i
are elements of a subfield ofF_{2^n}
to avoid invariant subfield attacks.
And from https://eprint.iacr.org/2016/492.pdf
§4.2 pg 13 The algebraic structure of MiMC allows to mount a invariant subfield attack on the block cipher under a poor choice of round constants. That is, if all the round constants
c_i
and the keyk
are in subfieldF_{2^m}
ofF_{2^n}
then by choosing a plaintextx ∈ F_{2^m}
an adversary can ensure thatE_k(x) ∈ F_{2^m}
. This attack is thwarted by picking n to be prime. The only subfield is then F_2 such that picking constants!= 1
will be enough to avoid the attack.
My question is: when operating over F_p
, where p
is an odd prime, what is the purpose of A
?
The nonlinear function is not just inversion; it is 1/x if x is nonzero, otherwise 0. I don't know how to do this in fewer than 4 R1CS constraints, and I conjecture that it must require at least 3. You can use a single constraint if the case x = 0 is prohibited, but then the hash (or cipher) isn't defined for all inputs.
Hi all, I'm not exactly sure how I landed in this page, but I think I can help with some of the questions (I am one of the designers for Jarvis and Friday).
@HarryR: My question is: when operating over F_p, where p is an odd prime, what is the purpose of A?
You are not supposed to operate Jarvis over F_p (or any other prime field). Jarvis is designed, optimized, and most importantly - argued(*) to be secure only on binary fields (e.g., F_{2^{160}}). The last point is extremely important because I have no reason to believe that Jarvis remain secure when operating on F_p. It surely isn't optimized to work in on prime field exactly due to the reasons above (A is indeed meaningless and the behavior of 1/x may not have the required algebraic properties).
(*) I wrote about the difference between arguing and proving security in the blog post linked at the top of this page.
As for @daira's comment - Jarvis and Friday were designed to be STARK-efficient (i.e., as explained to me (this is somewhat outside my field of expertise), to work in the Turing machine model, rather than the circuit model - I hope that this sentence in not utter rubbish) and it is indeed not as efficient as it could be when encoded as R1CS.
We are working on a new design (actually, several) called Pepper that operates on F_p and that would require 3 R1CS per step (a step in this design is similar to a round in Jarvis), 2 steps in each round and 5-10 rounds in total. We estimated the overall cost of Pepper to be ~70 constraints but the research is ongoing so this may go up or down still. Our analysis (so far) agrees that it is between hard and impossible to go below 3 R1CS in F_p and retain security. But we're still working on that. I expect that we'll release this design(s) within 1.5-2 months.
One more thing (sorry for flooding) - since all of this is still work in progress (including Jarvis, in a way) we are eager to receive feedback. I've seen in the text above that Jarvis' inverse is more complicated to implement because it uses an inverse rather than cubing. We're very much interested in such feedback and should we receive it, there's a good chance that we can solve/improve certain problems (we already have an idea how to replace inversion with something more friendly).
Another very important point to stress is that (B,C) != (SubBytes,AddRoundKey). Both B and C are low degree polynomials composed in a special way to generate a high degree polynomial. They are specific to the binary field you are working with and we listed a few examples for the most likely field sizes in the appendix of the ePrint report linked above (https://eprint.iacr.org/2018/1098.pdf). I need to ask Siemen if he uploaded the Sage code to generate these polynomials (and their composition). If he hasn't I'll make sure this is made available somewhere.
@TomerAshur's points are well taken. However, if for the sake of argument we were going to use f such that f(0) = 0 and f(x) = 1/x when x ≠ 0 as a nonlinear function over Fp, this would be how to implement r = f(x) in 3 R1CS constraints:
1⃣ (1 - b) × (b) = 0
2⃣ (1 - b) × (x) = 0
3⃣ (r) × (x + b - 1) = (b)Proof of correctness:
Constraint 1⃣ enforces that b (a fresh private witness variable) is boolean.
Case b = 0: substitute b in 2⃣ giving 1 × (x) = 0 therefore x = 0, and also substitute b and x in 3⃣ giving (r) × (-1) = 0, so we have r = 0 and x = 0.
Case b = 1: substitute b in 2⃣ giving 0 × (x) = 0 which has no effect, and substitute b in 3⃣ giving (r) × (x) = 1 which constrains r = 1/x and x ≠ 0.
The correct case for b is forced by the x = 0 or x ≠ 0. ◾
I conjecture that this is not possible in fewer than 3 constraints, if x = 0 needs to be handled correctly. Note that if you can guarantee that the case x ≠ 0 never occurs, then 1 constraint suffices.
I cannot emphasize enough: do not just stick this f into either Jarvis or MiMC unless and until some cryptographer with experience in this area analyses it. I am not making any claim about its security.
The name of the intermediate language that zk-STARKs are expressed in is AIR (Algebraic Intermediate Representation).
That was unexpectedly productive, I really appreciate having some insight into this even if it means it'll have to be put on hold for a while waiting further review.
I've been testing the Jarvis implementation (without A
) over small prime fields, I can confirm that it is resistant to Lagrange interpolation after a single inversion and the Python code in the ticket description faithfully implements the algorithm (and has many of the properties you'd expect from a cipher). However, I haven't looked into meet in the middle attacks or anything more advanced, but I will give it a go and see what happens.
Well, over a prime field it is not Jarvis. It's something else that has not been analysed.
Not related to Jarvis, but I see that you're referring to MiMC-R/p/5 with Miyaguchi–Preneel as LongsightL. As described in zcash/zcash#2233 (comment) , LongsightL-R/p/e is actually Rogaway and Steinberger's LPA231 construction (with typo corrected!), using MiMC-R/p/e to instantiate π1..3, with independent round constants for the three instantiations.
(Indeed, that was the point of calling it something else than MiMC.)
Edit: see zcash/zcash#2233 (comment) for Miyaguchi–Preneel.
I wrote:
However, if for the sake of argument we were going to use f such that f(0) = 0 and f(x) = 1/x when x ≠ 0 as a nonlinear function over Fp, this would be how to implement r = f(x) in 3 R1CS constraints:
[...]
I conjecture that this is not possible in fewer than 3 constraints, if x = 0 needs to be handled correctly. Note that if you can guarantee that the case x ≠ 0 never occurs, then 1 constraint suffices.
In practical applications it might be permissible for the hash function or cipher to fail deterministically on a small proportion of possible inputs — provided that its R1CS implementation results in an unsatisfiable constraint system for those inputs. Call such a hash or cipher, if it fails for a proportion of inputs that is negligible in the security parameter, "not quite total".
For instance, the Merkle tree over commitments in Zcash is completely public. Whether a given commitment causes a failure in the computation of the new root will depend on the position at which it is added to the tree. We can tolerate a small probability of hash failure, by making it a consensus rule that no commitment is added at a position where it causes such a failure. This does not significantly increase the feasibility of finding non-failing collisions.
Suppose that the probability that a given inversion fails is at most ε, and the number of inversions in the hash is k. By the Fréchet inequality for disjunctions, the probability that any inversion fails for random input is at most k·ε. If there are d layers in the Merkle tree, then the probability of failure for adding a random commitment at a random position in the tree is at most d·k·ε. If this is small then heuristically, even if adding a commitment fails, it will be possible to add it in another position soon afterward.
(The probabilities of failure at different positions in the tree cannot in general be assumed to be independent. In principle, it could be the case that a tree gets "stuck", unable to add new commitments, or unable to add a given commitment at any subsequent position. However, this will not occur in practice for a Miyaguchi–Preneel hash constructed from a random not-quite-total cipher used in a binary Merkle tree, where the left input to the Merkle hash at each layer occupies the first block, and the right input occupies the second block. Informally, that is because we alternate whether the commitment to be added is in the left or right input; and in Miyaguchi–Preneel the key for the second block gets randomized by the chaining value output from the first block. Other combinations of hash and tree constructions are likely to also avoid this potential problem, although each needs a specific analysis.)
The analysis for not-quite-total ciphers or keyed hashes is different. We need to assume that any failure of the encryption or decryption functions could act as an oracle for the adversary. In general this needs a protocol-specific analysis. On the other hand, if keys are independent and uniform-random, then the probability of a given encryption or decryption failing should be very small; so it seems plausible that using a not-quite-total cipher could be acceptable for protocols where the performance gain justifies the additional complexity of analysis.
@TomerAshur the difference in cost on Ethereum EVM for inversion vs modulo multiplication is quite large.
For example, with this code: https://gist.github.com/HarryR/d69ad7cab008ed53a5eca4fdc08ffcc4
Performing 1 inversion costs 15000 gas. And around 145 modulo multiplications can be performed at around the same cost (without any optimisations, 250+ with optimisations). MiMC requires around 480 mulmod
operations for 160 rounds, and the cost is comparable to about 3 inversions.
This makes using inversions more expensive on EVM compared to MiMC - at least to make it infeasible to use Lagrange interpolation (e.g. degree of polynomial is equal to the field modulus).
I have been doing statistical analysis on Jarvis/Friday over prime fields versus MiMC, inversion and a simulated random-oracle model.
See: https://github.com/HarryR/doublevision/blob/master/test_bijective.py
This program runs across the first ~50 prime numbers and creates a grid of every key and message within the range of the prime field, then analyses the frequency of every integer within the field versus its expected frequency (which is 1/p
). Then we look at the standard deviation of the ratio of the actual versus expected frequency. This provides a lot of insight into bijectivity (and suitability as a cipher), uniformly random distribution - but also split by the properties of the primes (e.g. gcd(n, p-1)==1
).
There are some things which I am concerned about, the output of the program is included below.
Columns:
- Median
- Minimum
- Maximum
Dimensions:
RD
- Per rowCD
- Per column
Tests:
gcd(n,p-1)==1
(then the number of primes which matched this criteria), wheren
is assumed to be an exponent.- bijective_k_each_msg - Row-wise, every
k
is bijective for eachm
- bijective_m_each_key - Column-wise, every
m
is bijective for eachk
- latin_square - Bijective for both
k
andm
, e.g. a latin square
Where the median RD
is 0.0
, this probably acts like a cipher, where RD
is 0.0
for both median, minimum and maximum it is a cipher - that is, for every k
, m
is bijective. When the resulting grid is a latin square both CD
and RD
are 0.0
(for all median, minimum and maximum).
Interestingly, you can see the difference between mimc_e3
where gcd(3,p-1) == 1
and other types of primes - this shows that MiMC with an exponent of 3 is only usable when the prime has that property.
But... what concerns me is the results of friday
compared to other MP one-way-function constructs, it has a significantly lower median row-wise compared to the random oracle, but is almost perfect column-wise when compared to the random oracle.
This suggests that there is a deficiency with friday
when compared to MiMC+MP....
However, Jarvis
is less fragile than MiMC with respect to operating across a range of prime numbers, MiMC is horribly insecure with the wrong exponent, whereas Jarvis (and inversion generally) works equally the same across all primes.
friday
gcd(3,p-1)==1 31
RD: 0.39 0.28 0.75
CD: 0.80 0.58 0.82
gcd(5,p-1)==1 44
RD: 0.41 0.28 0.76
CD: 0.80 0.58 1.00
gcd(7,p-1)==1 50
RD: 0.39 0.28 0.76
CD: 0.80 0.58 1.00
inversion
bijective_k_each_msg 125
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
bijective_m_each_key 125
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
gcd(3,p-1)==1 31
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
gcd(5,p-1)==1 44
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
gcd(7,p-1)==1 50
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
latin_square 125
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
jarvis
bijective_m_each_key 125
RD: 0.00 0.00 0.00
CD: 0.80 0.53 0.82
gcd(3,p-1)==1 31
RD: 0.00 0.00 0.00
CD: 0.80 0.53 0.81
gcd(5,p-1)==1 44
RD: 0.00 0.00 0.00
CD: 0.80 0.58 0.81
gcd(7,p-1)==1 50
RD: 0.00 0.00 0.00
CD: 0.80 0.53 0.82
mimc_e3
bijective_m_each_key 79
RD: 0.00 0.00 0.00
CD: 0.80 0.53 1.00
gcd(3,p-1)==1 31
RD: 0.00 0.00 0.00
CD: 0.80 0.53 1.00
gcd(5,p-1)==1 44
RD: 0.00 0.00 1.86
CD: 0.80 0.45 1.00
gcd(7,p-1)==1 50
RD: 0.71 0.00 1.98
CD: 0.80 0.45 1.00
mimc_e5
bijective_k_each_msg 3
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
bijective_m_each_key 105
RD: 0.00 0.00 0.00
CD: 0.79 0.00 0.90
gcd(3,p-1)==1 31
RD: 0.00 0.00 3.86
CD: 0.79 0.00 0.90
gcd(5,p-1)==1 44
RD: 0.00 0.00 0.00
CD: 0.79 0.00 0.90
gcd(7,p-1)==1 50
RD: 0.00 0.00 3.86
CD: 0.78 0.00 0.90
latin_square 3
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
mimc_e7
bijective_k_each_msg 2
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
bijective_m_each_key 113
RD: 0.00 0.00 0.00
CD: 0.81 0.00 1.03
gcd(3,p-1)==1 31
RD: 0.00 0.00 5.24
CD: 0.80 0.53 1.00
gcd(5,p-1)==1 44
RD: 0.00 0.00 5.23
CD: 0.81 0.00 1.03
gcd(7,p-1)==1 50
RD: 0.00 0.00 0.00
CD: 0.81 0.00 1.03
latin_square 2
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
mimc_mp_e3
gcd(3,p-1)==1 31
RD: 0.80 0.29 0.90
CD: 0.80 0.29 0.82
gcd(5,p-1)==1 44
RD: 0.79 0.29 0.83
CD: 0.80 0.29 0.82
gcd(7,p-1)==1 50
RD: 0.79 0.29 0.90
CD: 0.80 0.29 0.82
mimc_mp_e5
bijective_k_each_msg 3
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
bijective_m_each_key 3
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
gcd(3,p-1)==1 31
RD: 0.80 0.00 1.02
CD: 0.79 0.00 1.00
gcd(5,p-1)==1 44
RD: 0.80 0.00 1.31
CD: 0.80 0.00 1.00
gcd(7,p-1)==1 50
RD: 0.80 0.00 1.31
CD: 0.79 0.00 1.00
latin_square 3
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
mimc_mp_e7
bijective_k_each_msg 2
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
bijective_m_each_key 2
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
gcd(3,p-1)==1 31
RD: 0.79 0.29 1.26
CD: 0.79 0.29 0.90
gcd(5,p-1)==1 44
RD: 0.79 0.00 1.26
CD: 0.79 0.00 0.90
gcd(7,p-1)==1 50
RD: 0.79 0.00 1.26
CD: 0.80 0.00 0.90
latin_square 2
RD: 0.00 0.00 0.00
CD: 0.00 0.00 0.00
random_oracle
gcd(3,p-1)==1 31
RD: 0.80 0.62 1.00
CD: 0.80 0.58 0.82
gcd(5,p-1)==1 44
RD: 0.80 0.45 1.00
CD: 0.80 0.51 0.84
gcd(7,p-1)==1 50
RD: 0.80 0.45 1.00
CD: 0.80 0.51 0.84
Analysis has confirmed anomalies with Jarvis & Friday over prime fields.
This should not be implemented for use with EthSnarks.
Closing.