Very poor results - is there a bug or did I do it wrong?
magnumripper opened this issue · 5 comments
Sorry for lengthy post.
I did some quick tests with this because it looked interesting. I was going to post results to john-users mailing list (and consider implementing it natively in JtR) but results were so poor I suspect something is wrong - a bug, or user error on my side?
I used the rockyou list (with dupes but ASCII only) to train OMEN (full ASCII alphabet) as well as JtR's incremental and markov modes. I ran them against a set of real-world NT hashes scraped from pastebin.
- Create a rockyou file w/ dupes, ASCII only and length 0-16:
$ perl -ne 's/\r//g; print if /^[\x20-\x7e]{0,16}$/' < rockyou.original.dlst > rockyou_ascii_16.dlst
- For JtR's markov mode (since it's designed to run to end as opposed to produce best candidates early) I adjusted level to produce approx. 100M candidates. At max. length of 16 that was level 190:
$ ./calc_stat -p rockyou_ascii_16.dlst stats
$ ./genmkvpwd stats 0 16
$ rm -f john.pot && ./john scraped.sam -form:nt -v:1 -markov:190 -max-len=16
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/stats, lvl=190 len=16 pwd=96840868)
1822g 0:00:00:04 100.00% (ETA: 10:27:53) 380.3g/s 20217Kp/s 20217Kc/s 103475MC/s }|..}
- The exact number of candidates produced above (96,840,868) was then used with the (brand new) -max-cand option for incremental mode, to get the same number of candidates tried:
$ sed 's/^/:/' < rockyou_ascii_16.dlst > rock.pot
$ ./john -make-charset=custom.chr -pot=rock.pot
$ rm -f john.pot && ./john scraped.sam -form:nt -max-cand=96840868 -v:1 -inc:custom -max-len=16
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
911g 0:00:00:12 73.58g/s 7822Kp/s 7822Kc/s 36739MC/s pwdreg..pwdrik
- The same number of candidates were generated with OMEN:
$ perl -e 'foreach $i (32..126) { print chr($i) }' > alphabet.txt
$ createNG --iPwdList rockyou_ascii_16.dlst -A alphabet.txt -v
$ rm -f john.pot && enumNG -p -m 96840868 | ./john scraped.sam -form:nt -v:1 -stdin
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press Ctrl-C to abort, or send SIGUSR1 to john process for status
83g 0:00:03:24 0.4063g/s 474059p/s 474059c/s 2488MC/s 121242:7:..192:2:243
Wow :-( Not only was it way slower, the result was also very poor. The low speed was partly due to pipe overhead but mostly not because of that. And anyway the poor result was even more discouraging. Actually, 82 of those 83 cracks were passwords consisting of just digits.
Similar tests with 1G candidates show same poor results. I can't see what I'm doing wrong here?
$ rm -f john.pot && ./john scraped.sam -form:nt -v:1 -markov:213 -max-len=16
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
MKV start (stats=$JOHN/stats, lvl=213 len=16 pwd=1010366652)
2227g 0:00:00:47 100.00% (ETA: 10:49:28) 46.92g/s 21288Kp/s 21288Kc/s 88428MC/s }U..}
$ rm -f john.pot && ./john scraped.sam -form:nt -max-cand=1010366652 -v:1 -inc:custom -max-len=16
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press 'q' or Ctrl-C to abort, almost any other key for status
1311g 0:00:00:52 25.14g/s 19377Kp/s 19377Kc/s 80804MC/s tmd0185..tmd0190
$ rm -f john.pot && enumNG -p -m 1010366652 | ./john scraped.sam -form:nt -v:1 -stdin
Loaded 5294 password hashes with no different salts (NT [MD4 128/128 AVX 4x3])
Press Ctrl-C to abort, or send SIGUSR1 to john process for status
104g 0:00:34:48 0.04980g/s 483884p/s 483884c/s 2516MC/s tifsnjmfouf234..tifsnjpobtu234
Very slow, and again all but one (of the very few) cracks were digits-only. Note that only 104 correct guesses were generated in total over 35 minutes. Compare that to JtR markov's 47 correct guesses per second.
Here's a completely different test. At which point do they find "123456" or "password" which are among the most common RockYou passwords? JtR's markov mode is not tested here since it's not supposed to produce best-early.
$ ../run/john -inc:custom -stdout -max-len=16 | grep -Fxnm1 123456
Press 'q' or Ctrl-C to abort, almost any other key for status
1:123456
$ enumNG -p | grep -Fxnm1 123456
200:123456
$ ../run/john -inc:custom -stdout -max-len=16 -ses:0 | grep -Fxnm1 password
Warning: only 95 characters available
Press 'q' or Ctrl-C to abort, almost any other key for status
409345:password
$ enumNG -p | grep -Fxnm1 password
(I gave up after 25 minutes)
So hundreds of millions of candidates were generated but the 4th most common rockyou password wasn't among them. Something's amiss. I did try a different set of test hashes (subset of LinkedIn) but results were equally sad. I also tried using the -s option to enumNG (which would need it to be incorporated into JtR codebase IRL) and it didn't help noticably.
Thanks for trying out and sharing your thoughts. Would be great if you can share the datasets used (your version of rockyou and NT hashes used). If I find the time, I'll investigate the issue. You can send me an encrypted email via maximilian (-dot-) golla -at-/ rub (-dot-) de
I was able to identify the problem. It is a documentation flaw on our side and related to the technique you generated the alphabet. OMEN expects the alphabet to be sorted by the char frequencies and we forgot to mention that in our ReadMe.md file. I will update the description to make it clear. Sorry.
You generated the alphabet like this:
$ perl -e 'foreach $i (32..126) { print chr($i) }' > alphabet.txt
resulting in an alphabet file with the following content:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
But the alphabet OMEN expects, looks like this (and it is only 94 not 95 chars, seems some char is not present in our rockyou-training file or we have another bug here):
ae1ionrls20tm39c8dy54hu6b7kgpjvfwzAxEIOLRNSTMqCDB.YH!U_PKGJ-*VF@WZ#/X$,&+Q?\)=(';%<]~[:^`">{}|
I will show you how to generate it below. The order in which the characters appear in the file is important (and we forgot to write this in our documentation, sorry for this). In OMEN, we included a utility called alphabetCreator
.
In your case, you need to do the following
$ touch empty-alpha-file
$ ./alphabetCreator --pwList rockyou_ascii_16.dlst --size 95 --alphabet empty-alpha-file --output rockyou
$ rm empty-alpha-file
$ cat rockyou.alphabet
ae1ionrls20tm39c8dy54hu6b7kgpjvfwzAxEIOLRNSTMqCDB.YH!U_PKGJ-*VF@WZ#/X$,&+Q?\)=(';%<]~[:^`">{}|
$ ./createNG --iPwdList rockyou_ascii_16.dlst -A rockyou.alphabet -v
<-- START OPTIONAL: Check that everything is correct now, by running -->
$ ./enumNG -p -m 1000 > top1k.txt
$ grep password top1k.txt
password
<-- STOP OPTIONAL: Check that everything is correct now, by running -->
$ rm -f john.pot && ./enumNG -p -m 96840868 | ./john scraped.sam -form:nt -v:1 -stdin
All the best,
Maximilian
P.S.: Please let me know if this fixes your problem, then I will close the issue.
P.P.S.: Something helpful for your JTR Markov workflow (automatically extracting the correct level):
https://github.com/RUB-SysSec/Password-Guessing-Framework/blob/master/src/scripts/JTR_MARKOV.sh
Ah, I suspected it was something silly like that. I did try the alphabetCreator but failed to give it an empty alphabet file so it ended up with an alphabet identical to the perl script.
I'll start over from scratch with my testing, thanks!
Just curious: What is the purpose of the -a option to alphabetCreator? You should probably document it. During my failed attempt (before OP) I did try the alphabetCreator but since an input alphabet was mandatory I gave it a start similar to the perl thing (ASCII order) and it ended up with a build alphabet identical to the input one.
Well you can use this option to provide a "base" alphabet, so characters you really want to use, besides of what the training corpus is telling you. You can even influence the order. I think for people who are not experimenting with password modeling there is no much use in classical preimage attacks. I think for this, the best option is to limit it to the training corpus by providing an empty file.