cathugger/mkp224o

Please explain how to read the statistics (for benchmarking)

Closed this issue · 7 comments

I tried benchmarking various ./configure settings, but the statistics change very little. Binary sort or not, intfilter or not, it doesn't really matter. All that matters is the number of filters and whether RegEx are turned on or not.

Thus, I want to make sure that I am reading the stats correctly. What does each of the 3 metrics stand for?

  • calc/sec: typically around ~3M (with RegEx) ... does this mean that this many possible onions are tried per second? That's a lot.
  • succ/sec: typically 0 ... is this the number of matches found per second?
  • rest/sec: typically 1-2 ... the code says it's about "restarts" ... what are restarts?

When benchmarking, am I only interested in calc/sec, or is it really succ/sec that I'm interested in?

Thank you.

EDIT: would be also great to understand what optimizations make sense for RegEx / if they are different from non-RegEx.

for filtering settings, it will only really show up with larger amount of filters.
calc/sec - yes
succ/sec - yes
rest/sec - restarts are basically resets of generator seed from random source.. i think? forgot what that one was for

when benchmarking, only calc/sec is important

re regex, yes regex filtering overrides other filtering optimizations, but it doesnt override generators

OKAY! So does the lack of optimizations mean that it could be much faster to disable regex and instead just enumerate all the possible regex matches?

Also, how do I find the right trade-off between these two things?

a) Higher calcs/sec if fewer and less complex filters are used
b) More matches per calc (on average) if more filters are used

I this the correct way to think about it?

In other words, does the algorthm just brute force through the same potential keys — no matter how many and which filters are used? Or is it smarter than that?

Oh fuck man, I compiled without RegEx and just enumerated all the filters and my calcs/sec went up by a factor of 7x ....

Sorry rainforest for the past weeks :D

when you get more calc/sec with same filters, you will also get more succ/sec
more filters will actually usually result in more succ/sec, because even if calc/sec is somewhat slower, more of these calculations will get caught
for low count of filters (10~20 or so), dont bother with binary search, it may actually end up slower due to more complex cpu branches of binary search and less efficient prediction of them
for more it may be more efficient
if you use dictionaries then you definitely want binary searching
also intfilter stuff is p much always faster regardless whether you do binary search or not, its not default because of its length limitation but if words you use fit, use it

and yes regex is probably slower than non-regex stuff, ive just not benchmarked that myself but your benchmark don't sound that surprising

There's no free lunch!

My key learning here is that RegEx should only be used if there's no good way to enumerate the filters.

Thank you.