ulif/diceware

Show entropy of generated passphrases

ulif opened this issue · 2 comments

ulif commented

The (Shannon-)entropy of a generated passphrase, given in bits, could help to compare passphrase complexity and "secureness". It is at least more reliable than the red-yellow-green indicators shown with some password managers.

Might be an easy catch if the entropy per word mentioned on the Wikipedia page on diceware is summed up:

The level of unpredictability of a Diceware passphrase can be easily calculated: each word adds 12.9 bits of entropy to the passphrase (that is, log 2 ⁡ ( 6 5 ) {\displaystyle \log _{2}(6^{5})} \log _{2}(6^{5}) bits).

ulif commented

Hi @sluedecke , I am afraid, this is not the complete truth. The above formula is valid only

  1. for word lists of a certain length (here: 6**5 elements)
  2. and under the condition that each word will be picked with the same probability.

Both conditions are not met necessarily in diceware. We have longer lists included, which are not a real problem but also easy to calculate.

But: more difficult is the situation, when lists are used whose length does not match a power of the number of dice sides. In this case it can happen, that some elements are more lilkely to be picked thatn others.

Let me give you a sample: Imagine a word list of three words ("one", "two", "three") and a 2-sided dice (a coin for example or a PRNG delivering bits). How do you pick one of the three element with the coin, giving each of the three the same chance? There are at least three ways you can deal with that:

a) You can say: forget about one of the elements. I throw away one and do exactly one throw to decide between the remaing two elements. This means, you reduce the entropy to 1 bit and this is also what we currently do in diceware in that situation (we reduce the worldlist length).
Entropy: 1 bit, Coin throws: 1, entropy calculation: trivial log2(shortened_list_len)

b) You can say: I throw the coin two times (getting a value between 1 and 4) and if I get 1, 2, or 3, I pick that, but if I get 4, I repeat the procedure and will throw the coin two other times. In this case you preserve the possible entropy of log_2(3) = 1.5849... but this procedure might never end. You cannot tell, how many throws you will need in advance. Good for entropy (of the result) but bad for fixed loops and people that have deadlines to meet. Also, I do not like to waste entropy.
Entropy: 1.5849... bits, Coin throws: n, entropy calculation: trivial log2(list_len)

c) You could also say: I mix up the possibilities above, but instead of removing an element of the wordlist (a) I will do exactly one throw more than needed in case of (a) and do a mod(3) on the result. This leads to such a mapping:
1: one, 2: two, 3: three, 4: one.
Here the wordlist element "one" has double the chance of "two" or "three" to be picked and this reduces the entropy in a more difficult way. You get better entropy per word than with a) but worse than with b). The number of coin flips (or dice rolls) is foreseeable, but the entropy is much harder to compute. You have to do the whole formula for Shannon-Entropy I guess. Here it will give: -( (0.5 * log2(0.5)) + 2 * (0.25 * log2(0.25))) which happens to be 1.5.
Entropy: 1.5 bits, Coin throws: 2, entropy calculation: not completely trivial

The per-word entropy of a wordlist can further be reduced by double entries, the entropy of a generated passphrase (which normally should be the sum of the per-word entropies) can suffer from prefix problems. All this might be taken into account when calculating the entropy.

And this, in too many words, sorry for that, is the reason, why there is yet no entropy calculation. Also I consider to switch from a) to c) for diceware, when it comes to use of real dice. Or maybe b)? Solution a) can mean a vast (and unnecessary) reduction of entropy for instance when it comes to 20-sided dice and the like.

Please note, that the above considerations are not a problem for regular system-based passphrases (they use b., as of Python 3.x), it is a problem for diceware with throwing real dice.

Please also accept my apologies for not responding earlier!