cathugger/mkp224o

Is it possible to apply a filter for numbers only?

Internet-viewer opened this issue · 7 comments

The combination of numbers only is easier to remember. I am wondering if it is possible to add a function to filter out addresses start with numbers only? It takes significantly less amount of time to generate an address with just a random string of numbers for more digits. But it would be extremely difficult for an adversary to generate an address starting with exactly the same string. A possible application would be to tell the user to just remember 15 - 20 numbers to verify the genuine address.

<20 numbers + the rest 34 digits>.onion

If my understanding is correct, and use letter L for num one, Letter o for num zero, letter x for num 8, letter y for num 9.

The calculation for a 20 digit random numbers only address on a station with benchmark at 50 M/s and 0.99 possibility is log(1-0.99)/log(1-3.2^20)/50M/3,600 = 0.3 hr.

For an adversary, it would take ~ 3.7*10^15 yr to generate the same string under the same conditions.

I would not agree that numbers are easier to remember. Which of these two equally-sized strings is easier to memorize: ArmchairRunningTrousers or 42373317702160484360385? Of course it's harder to find dictionary words than random numbers. Even compared to random alphanumeric addresses, I would prefer that over a purely numeric address as it is easier to find patterns to remember it by (short words, leetspeak, etc.).

To do the search you are looking for, you can use a version of mkp224o compiled with regex support (pass the --enable-regex flag to ./configure). With that, you can use the filter \d{n,} where you replace n with the minimum number of digits to match at the start. If you also want to include other characters, you can use a filter such as [2-7loxy]{n,}. Regex filtering will be slower, but at least in my experience, not that much slower.

The calculation for a 20 digit random numbers only address on a station with benchmark at 50 M/s and 0.99 possibility is log(1-0.99)/log(1-3.2^20)/50M/3,600 = 0.3 hr.

Yes that seems about right. Finding some combination of digits is not that hard compared to finding a specific string because there are 10^n valid solutions rather than 1.

For an adversary, it would take ~ 3.7*10^15 yr to generate the same string under the same conditions.

Much longer for 99% confidence, slightly shorter for 50%. Yes currently it would be infeasible to generate another key with the same first 20 digits. However it is possible to generate the first and last several digits which is enough to trick some people.

Thanks for your comment. Yes, dictionary words are much easier to remember. So maybe it is also possible to find combination of words from a given list? So the address will be something like the Monroe seeds? I feel it would be relatively slow to iterate over a word list. Another approach could be to find words based on digits, e.g. 228 for cat. They use it a lot in hotlines easier for customers to remember.

So maybe it is also possible to find combination of words from a given list? ... I feel it would be relatively slow to iterate over a word list.

Yes, you can use a filter file for this (using the -f <file> argument), which is an enter-separated list of filters. Compiling with --enable-binsearch (and possibly also --enable-intfilter=64 and/or --enable-besort depending on your exact needs) will allow you to use a large list of filters (such as a full dictionary) with relatively little performance impact. A binary search scales logarithmically which means if you have 100000 filters, and you add another 100000, it will not double the search time, and the more filters you have, the less relative performance impact each of them has.

So the address will be something like the Monroe seeds?

I'm just going to guess you mean Monero seeds. These are not vanity addresses but rather a reperesentation of the key using words from a fixed dictionary. You can think of it as a base-1656 number, where the digits are words. You could convert any random onion address to this format, but you would have to convert it back before you could actually use it with the tor network. See also BIP39 for a different approach.

Another approach could be to find words based on digits, e.g. 228 for cat. They use it a lot in hotlines easier for customers to remember.

An interesting idea. You could again use a dictionary as a filter file, but map each letter to it's corresponding number and remove duplicates before running mkp224o. Or you could match any random numbers as previously discussed and try and find valid words within that after the fact with some external program.

Yes, you can use a filter file for this (using the -f <file> argument), which is an enter-separated list of filters. Compiling with --enable-binsearch (and possibly also --enable-intfilter=64 and/or --enable-besort depending on your exact needs) will allow you to use a large list of filters (such as a full dictionary) with relatively little performance impact. A binary search scales logarithmically which means if you have 100000 filters, and you add another 100000, it will not double the search time, and the more filters you have, the less relative performance impact each of them has.

That is a cool feature, thanks. It gives a bunch of addresses starting with words given in a list. Is there any possibility to filter words consecutively? Let's say, instead of filtering out addresses starting in wordlist "alpha", "bravo", "charlie", "delta", it can pick out addresses starting like "alphabravocharliedelta" and "bravocharliealphadelta" etc. I feel an address like that will be easier to remember.

An issue I encountered during filtering is the stdout gives following message:
set workdir nekokeys/
Aborted

I compiled with the following options: ./configure --enable-amd64-51-24k --enable-intfilter --enable-binsearch --enable-intfilter=64
The following is command to run: ./mkp224o -d nekokeys -f wordlist.txt
For testing purpose, I used the wordlist from onionshare for their authentication function: https://github.com/onionshare/onionshare/blob/develop/cli/onionshare_cli/resources/wordlist.txt

For my computer, a short list containing 20 - 30 words seems to be fine but not with that file with over 7k words.

Is there any possibility to filter words consecutively? Let's say, instead of filtering out addresses starting in wordlist "alpha", "bravo", "charlie", "delta", it can pick out addresses starting like "alphabravocharliedelta" and "bravocharliealphadelta" etc.

Not that I'm aware of. If you want that you will likely have to add those entries to your filter list. One could probably write a script that creates permutations of words to a certain minimum length.

compiled with the following options: ./configure --enable-amd64-51-24k --enable-intfilter --enable-binsearch --enable-intfilter=64

--enable-amd64-51-24k is not an option. You probably want either --enable-amd64-51-30kor --enable-amd64-64-24k. You only need to specify --enable-intfilter once with the size you want. For a list like that with variable length filters, you will want to include --enable-besort, this will likely fix your issue.

Many thanks! I'll have a try.

-N combined with list of numbers may also help to extend length of all-numbers part