cathugger/mkp224o

Generating easy-to-remember onion addresses

zazaulola opened this issue · 3 comments

I found 4 projects which generate onion addresses. The fastest algorithm is in the project mkp244o. But it looks for addresses that contain words from a specified list. I don't need words in the address, I find it easier to remember a sequence of the same or repeated characters. I formed specific filtering rules and implemented them in javascript language, forwarded mkp244o -y stdout stream to nodejs. This caused a huge performance degradation.

function processAddress(hostname){
  let score = 0;
  let inrow = 0;
  for(let i = 1; i < 56; i++){
    if(hostname[i-1] == hostname[i]) {
      inrow += 2;
      score += inrow;
    } else {
      inrow = 0;
      if(i > 1){
        if(hostname[i-2] == hostname[i]){
          score += 1;
        }
      }
    }
  }
  return score;
}

I noticed the support for pcre. But, for unknown reason, I have no luck enabling it.
Probably a regular expression could be, for example /(?:([a-z2-7])\1{3,}.*){3,}/.

What should I do? I tried to understand the C code. But I don't understand any preprocessor directives.

But, for unknown reason, I have no luck enabling it.

"unknown reason" is not helpful when you don't even say what you did.
you need to enable it at configure time. before re-configuring, you should do make clean to clean up old built files.

if you want all-same-char filtering, though, you can easily give mkp224o list like "aaa bbb ccc ddd ... 666 777" and use -N flag to specify how many of these "words" do you want.
for example, to achieve something like what you want (if i understood correctly), i would do something like this:

git clone https://github.com/cathugger/mkp224o
cd mkp224o
./autogen.sh
./configure.sh --enable-intfilter --enable-binsearch --enable-amd64-64-24k # binsearch x intfilter is fast when filters are of the same length, though usefulness of binsearch itself is questionable with this amount of filters and should be benchmarked
make -j -s
: > myfilters.txt
for x in a b c d e f g h i j k l m n o p q r s t u v w x y z 2 3 4 5 6 7; do echo $x$x$x >> myfilters.txt; done
./mkp224o -f myfilters.txt -N 2

this would generate addresses like bbbeeepxac4jodbhljmry2mpk7fceglghepchj7pywfheb5ccun7yoyd.onion.
you can ofc tweak it to include longer sequences of characters, possibly using --enable-besort at configure time if you end up with filters that aren't of equal length.

If script is run in bash shell and the +B option is enabled in the environment, it can be written a little shorter:

[ -f ./filter.txt ] && rm ./filter.txt
for c in {a..z} {2..7}; do echo "$c$c$c" >> ./filter.txt; done

However, neither regular expression nor dictionary generation solutions capture the essence of the problem. Although, these solutions can be a good compromise.

The thing is, it cannot be determined in advance which characters will make up the sequence and how long the sequence is. The idea is that the sequence aaaaa (ax5) is better than the sequence aaabbb.

For example, let's set the difficulty equal to 30 points to pass the filter.

  • The sequence aa (ax2) adds 2 points
  • the sequence aaa (ax3) adds 6 points
  • the sequence aaaa (ax4) adds 12 points
  • The sequence aaaaa (ax5) adds 20 points
  • The sequence aaaaaa (ax6) adds 30 points

In addition, for interleaved characters, we give 1 point each:

  • the sequence aba adds 1 point
  • the sequence dbdb adds 2 points

Perhaps the point coefficients should be reviewed and corrected, as I understand it is necessary to use integers for best performance.

This probably can not be achieved without adding code to the program. I will try to do my best to solve the problem. But I have hardly ever worked in C. 20 years ago, I had to work with CBuilder3 (this IDE is an analogue of Delphi). But that syntax is not like the current one, and the IDE generated more than half of the code automatically, I didn't have to work with preprocessor directives at all.

you can probably do 2 stage filtering: do simples like aa with mkp224o's filtering, and leave rest for your own filtering engine. you can use mkp224o's regex support too for stuff too, though don't do a lot of regex filters as they're slower and sorta have to be checked one after another the way it's implemented right now.

other than that, i don't really know, it feels your ideas are sorta opinionated regarding what "memorable" onion is, and it's likely that everyone else has their ideas on their own (i personally find some characters substituted with numbers rather memorable, and some random-looking sequences too), so it's kinda not something i'd like adding to mkp224o, as it seems to be both rather complicated to write, and probably kinda slow-ish too (computing scores for onions and comparing them sounds harder than current checks being done, except maybe for regex stuff), and existing code is sorta a mess already.