FriendlyCaptcha/friendly-pow

Clarify variance

ChrisJefferson opened this issue · 6 comments

It would be useful to know exactly how bad the variance can be.

Let's imagine we get popular and.. a billion POWs are done. What's the time the worst out of those billion will take, compared to the average time?

Good question!

Given we use a good hash function (Blake2B), we can expect every try to have the same probability of being succesful. The current default difficulty is 150, which corresponds to a threshold of Math.pow(2, (255.999 - 150)/8.0) >>> 0 = 9741. Every attempt has a probabiltiy of 9741/(2^32-1) = 0.0000022680033003604047. The inverse of that is 440916, which is the average amount of attempts required to find a single solution.

The number of puzzles is set to 12 so the average hashes required total will be 12*440916 which is around 5.3 million hashes required to solve the whole CAPTCHA. Of course both this threshold and this number 12 can be changed.

On a powerful desktop I've seen around 3 million hashes per second, so it takes about two seconds, in a JS fallback the slowest I've seen is 170k hashes per second, but let's halve that for a truly ancient phone: 85k. 5_300_000 million / 85_000 = 62 seconds. That sucks, but maybe a phone that slow doesn't even exist, and if it does they should update their browser and spend 5 to 10 times less time (because they could use WASM instead of JS).

The interesting bit is the variance of course: what if the user gets really unlucky! The probability of not finding a solution after N tries is is 1 - P^N with P the probability of not finding a solution. So here 1 - (1- 0.0000022680033003604047)^N.

Here's that probability distribution:
image
and here's a subset of it looking at 2 million + attempts
image

After 2 million attempts 99% solves will have completed, after 3 million it's 99.9%. If you are the unluckiest 1% in the world 12 times, it would take you 24 million hashes, or about 5 times longer than the average case. I wouldn't be surprised if the variance in time for a normal captcha is much worse!


Anyway, we're doing 12 puzzles and being the 'unluckiest 1% 12 times' seems a bit loosely defined and probably a bit inaccurate. Let's plot it a bit further:
image
Here's a plot of the expected number of solutions found after N hashes total, note that for 5.3 million it's about 50/50 whether the threshold of 12 has been reached (as expected!). We can compute the probability after 5.3 million tries to be done with binom.cdf(12, 5300000, p_per_try), that equals 0.5736.

My math is a bit rusty, so what I did was manually change that number of tries until we find the billionth unluckiest person, here's what I came up with:

> 1 / binom.cdf(12, 24800000, p_per_try)
1011684126611.5438

So that person would have to solve 24.8 million hashes, so like above around 5 times more than the average case.

Now let's say we halve the amount of puzzles required, but double the difficulty:

> 1 / binom.cdf(12/2, 24800000, p_per_try/2)
1893552.9763889806

suddenly that's only the 2 millionth unluckiest person, the variance is much higher.

Perhaps I should increase the number of puzzles required even more, the only downside is that the message sent to the server increases in length (by 8 bytes per solution), that's probably worth it for the decreased variance and a smoother progress bar.

Disclaimer: I may very well have made mistakes in the above calculations and invite anybody to verify my assumptions!

A little bit less extreme example: the probability of taking twice as long as the average time given the number of puzzles:

image

With the current setting of 12 puzzles the odds of being that unlucky is about 0.5%. If I increase the amount of puzzles to 18 it becomes less than 0.01%, is that worth transmitting 64 more ascii characters?

I wrote an article about this that is a bit less all over the place https://guido.io/posts/controlling-variance-in-proof-of-work-algorithms/

Thanks, that is a good answer.

I feel transmitting 64 characters is fine -- in practice it doesn't really matter how big a network packet you send, how many packets is more important, so unless this pushed you over the bounary into another packet it's fine -- and you are probably just logging in here, so will easily fit into a packet.

It's (to me) much more important some users don't get bored enough they leave, or assume the website has crashed. Also, Android / iOS users (particularly ones on slower devices) are going to get pissed if they feel their phone start to get warm, just logging in.

GottZ commented

@gzuidhof

Disclaimer: I may very well have made mistakes in the above calculations and invite anybody to verify my assumptions!

Umm..

* Solve the blake2b hashing problem, re-using the memory between different attempts (which solves up to 50% faster).
*
* This only changes the last 4 bytes of the input array to find a solution. To find multiple solutions
* one could call this function multiple times with the 4 bytes in front of those last 4 bytes varying.

Doesn't the current implementation already reduce required time?

Wouldn't it be more reasonable to xor the first 4 bytes with the nonce, to enforce proper re-hashing?

The entire input buffer is hashed and checked (128 bytes long), that particular routine just searches in the last 4 byte space if that makes sense. Any byte sequence is just as likely as the next to be a valid solution, it doesn't matter in which order the solutions are checked by the solver.

Maybe I'm misunderstanding your concern though! The input buffer contains the puzzle itself (which itself contains lots of randomness)