philsmd/7z2hashcat

Stacked filters/methods results in false negatives

magnumripper opened this issue · 16 comments

This will create an archive with stacked filters:

$ rm -f test.7z && 7z a test.7z ../run/john -pmagnum -m0=delta:6 -m1=bcj -m2=lzma

It will be Delta + BCJ + LZMA, in that order (I think).

Current 7z2hashcat will not complain about it, yet output a ciphertext with only Delta + LZMA which is uncrackable.

Note that the compressed file must be something that actually gets changed with the BCJ filter (as in the ../run/john binary used above) when creating samples to assess this. I first tried this with a text file but that was cracked. For a while I thought we were safe, but when I changed it to an actual x86 binary it was no longer crackable.

This is assumably an "extreme use" of 7z. I'm not sure we'll ever need to support this in the crackers (if we want to do so we need to change the output format to allow stacked filters), but we need to be aware of it - and maybe at least 7z2hashcat should warn about the situation and inform the user that the hash might well be uncrackable.

OK so I hacked JtR to do BCJ decoding even when ciphertext only said Delta, and now that file cracks.

The order that worked was: AES decrypt, then LZMA decode, then BCJ decode, then Delta decode and then the CRC check. This was expected given the -m indeces I used for producing the archive. Further tests confirmed we need to know the order of filters (7z -slt l will show the correct order at file level, not at archive level).

maybe at least 7z2hashcat should warn about the situation and inform the user that the hash might well be uncrackable.

The thing is once you coded that much, you could just as well output an extended format and be done with it (we can still update the crackers or chose not to, or they'll simply reject the new format).

Oh, and it gets worse.

Stacking compression methods is also allowed:

$ rm -f test.7z && 7z a test.7z ../run/john -pmagnum -m0=lzma -m1=lzma2 -m2=deflate

It will produce a file first compressed with LZMA, then with LZMA2 and finally with DEFLATE.

$ 7z -slt l test.7z

7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28
p7zip Version 17.04 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs LE)

Scanning the drive for archives:
1 file, 2737634 bytes (2674 KiB)

Listing archive: test.7z

--
Path = test.7z
Type = 7z
Physical Size = 2737634
Headers Size = 162
Method = LZMA2:12m LZMA:12m Deflate 7zAES
Solid = -
Blocks = 1

----------
Path = john
Size = 8644417
Packed Size = 2737472
Modified = 2021-12-07 03:47:10
Attributes = A_ -rwxr-xr-x
CRC = F669E96E
Encrypted = +
Method = LZMA:12m LZMA2:12m Deflate 7zAES:19
Block = 0

That's even more "extreme use" (although a good way of get rid of password crackers) but once we come up with a file format that can just list an arbitrary number of methods and their properties (if any), in correct order, it's not a problem.

Oh and according to the specification "You can use any number of methods". There are currently 14 of them, 6 compression methods and 8 filters.

I will from now on compress all my encrypted 7z archives using -m0=delta -m1=bcj -m2=bcj2 -m3=arm -m4=armt -m5=ia64 -m6=ppc -m7=sparc -m8=deflate -m9=lzma -m10=lzma2 -m11=ppmd -m12=bzip2. Good luck cracking that with any existing cracker 🤣

Not every combo seem to be accepted by current 7z though, so the above actually doesn't work.

Hey @magnumripper,

yeah, this is a really good github issue and definitely a valid request. It's also important to mention that changes that are needed/required here to detect the multiple codecs/filters do NOT only make sense in terms of new features (the stacking/combinations etc) that the crackers (john & hashcat etc) could support in the future, but it's also important to detect these cases to avoid confusions (or wrong "hashes") and/or false positives etc. So yeah, we need a new hash format...

I now came up with the following (4d3d1c1, kind of experimental, no windows final release yet, but "should work") changes in hash format specification and "multiple codecs" detection logic and the modified code required accordingly. For me it made most sense to keep the old format almost exactly AS-IS to still support older "hashes" within the crackers and just change the value/data meaning of the 2 attributes fields, whenever multiple filters/coders are being used. This new format will require the crackers (maybe change a little bit in code) to look at both the data type identicator and the 2 attributes fields (coder attributes and preprocessor attributes). hashcat for instance already "blocks" (or refuses to crack) hashes with multiple filters/coders out of the box, because it checks for hexadecimal attributes fields. The new format changes these two attributes fields a little bit by allowing commas and colons within these fields (the format is also explained in the README.md file and the header of 7z2hashcat.pl itself: comma-separated list in case of multiple filters and coders - within the appropriate field -, then comes a type/order identicator - this number encodes both the codec/type and position/order when it needs to be applied-, followed by a colon, followed by the attribute value itself in hexadecimal).

Example (unlikely/strange example due to unusual combination of filters/compressors):

$7z$130$...$15,17:5d00006000$00,19:

We need to look at both fields data type indicator and the coder attributes + preprocessor attributes.

The important fields that we need to look at have the following meaning:

  • $130 -> Delta Filter (8 << 4) + LZMA2 (2)
  • $15,17:5d00006000 -> because of the comma we know multiple compression algorithms are used as follows: attribute 0x15 for LZMA2 (see data type indicator), attribute 0x5d00006000 for LZMA1 (1 << 4) with priority 1 (therefore: +1)
  • $00,19: -> because of the comma we know multiple preprocessing filters are used as follows: attribute 0x00 for Delta Filter (see data type indicator), no attributes (after colon) for BCJ (1 << 4) with priority 3 (therefore: +3)

therefore order is (again, it's unlikely that you compress/filter files like this, just an example):

  • LZMA2 (priority 0, not explicit, but determined, data type indicator)
  • LZMA1 (priority 1)
  • Delta (priority 2, not explicit, but determined, data type indicator + priority 1 for LZMA1 was explicit)
  • BCJ (priority 3)

We also have 2 new variables directly within 7z2hashcat.pl that you can change for 7z2john.pl if you want/need:
$PASSWORD_RECOVERY_TOOL_SUPPORT_MULTIPLE_DECOMPRESSORS and $PASSWORD_RECOVERY_TOOL_SUPPORT_MULTIPLE_PREPROCESSORS.

They give an extra warning if set to 0 whenever the cracker does not suppotr the stacked/combined filters/coders.
I've also changed the $SHOW_LZMA_DECOMPRESS_AFTER_DECRYPT_WARNING variable that now was replaced by $SHOW_UNSUPPORTED_CODER_WARNING which makes more sense in my opinion.

I think this is a good, but still minimal and backwards-compatible, change that would make it clear within the hash output that multiple filters/coders are used for that 7z archive. The technical details of course are only important to the crackers or advanced users, so I didn't want to add too many extra fields and/or values (therefore the shifting again ;) as we already use within the data type indicator).

Hope this makes sense and works also for you. Thx again

That sounds great, I'll be testing it. Thanks!

@philsmd, if I actually create such (weird) archive that you describe:

rm -f test.7z && 7z a test.7z ../run/john -pmagnum -m0=lzma2 -m1=lzma -m2=delta:6 -m3=bcj

The order is honored:

$ 7z -slt l test.7z
(...)
Method = LZMA2:12m LZMA:12m Delta:6 BCJ 7zAES:19
(...)

However, the hash output from 7z2hashcat is NOT what you described:

$ ../run/7z2john.pl 2>/dev/null test.7z | fold | headtail -n1
test.7z:$7z$17$19$0$$16$5ba9a3e3925fc4141d7d2482cf9caff5$2073802776$2838896$2838
(... 70972 lines skipped ...)
8fbbf5ba47ab5313b981f0bfa70588786e28$9074200$,18:5d0000c000,35:17$,129:05

I'm not sure I should proceed implementing support in JtR until I understand what's going on here.

(Edited due to typos when creating archive the first time)

Trying a more sensible stack:

$ rm -f test.7z && 7z a test.7z ../run/john -pmagnum -m0=delta:6 -m1=bcj -m2=lzma2 -m3=lzma 

The order is honored:

$ 7z -slt l test.7z
(...)
Method = Delta:6 BCJ LZMA2:12m LZMA:12m 7zAES:19
(...)

The output from 7z2hashcat:

$ ../run/7z2john.pl 2>/dev/null test.7z | fold -w 100 | headtail -n1
test.7z:$7z$17$19$0$$16$8847423c7552ad592b395af5979bb7cd$2073802776$4465664$4465653$96a7611e4e6cec86
(... 89313 lines skipped ...)
6d89ce074030$9074200$5d0000c000,33:17$,131:05

So 17 >> 4 == 1 corresponds to BCJ encoding. 17 & 0xf == 1 corresponds to LZMA.

Am I just daft or is there a problem with the output?

Hey @magnumripper,

for me the fields seem to be perfectly fine. With the same approach/rules as mentioned above (example in #35 (comment)), the way to extract the correct data and how to determine the order of the coders/filters, works like this:

$7z$17$...$5d0000c000,33:17$,131:05

We need to look at all 3 fields: data type indicator, the coder attributes and preprocessor attributes.

The important fields that we need to look at have the following meaning:

  • $17 -> BCJ Filter (1 << 4) + LZMA1 (1), these are always the first 2 (maximum 2, could also be just 1, or none) of their type (filter and coder)
  • $5d0000c000,33:17 -> because of the comma we know multiple compression algorithms ("coders") are used as follows: attribute 0x5d0000c000 for LZMA1 (see data type indicator, the $17$ above); and attribute 0x17 (after colon) for LZMA2 (2 << 4) with priority 1 (therefore: +1)
  • $,131:05 -> because of the comma we know multiple preprocessing filters are used as follows: no attribute for BCJ Filter (see data type indicator), attribute 0x05 (after colon, note that Delta has a minimum value of +1 so it's actually 5 + 1 = 6... but this is just some inner workings of the Delta filter function) for Delta filter (8 << 4) with priority 3 (therefore: +3)

therefore, the order is:

  • LZMA1 (priority 0, not explicit, but determined, data type indicator)
  • LZMA2 (priority 1, explicit)
  • BCJ (priority 2, not explicit, but determined, from data type indicator we know it's the "first filter" + priority 1 for LZMA2 was explicit)
  • Delta (priority 3, explicit)

This is exactly what also Delta:6 BCJ LZMA2:12m LZMA:12m 7zAES:19 says us to do (the only step that we do NOT explicitly mention here is of course the 7zAES decryption step).

Does it make more sense now ?

Cheers

therefore, the order is:

  • LZMA1 (priority 0, not explicit, but determined, data type indicator)
  • LZMA2 (priority 1, explicit)
  • BCJ (priority 2, not explicit, but determined, from data type indicator we know it's the "first filter" + priority 1 for LZMA2 was explicit)
  • Delta (priority 3, explicit)

This is exactly what also Delta:6 BCJ LZMA2:12m LZMA:12m 7zAES:19 says us to do

Does it really? My understanding is that the Delta:6 BCJ LZMA2:12m LZMA:12m 7zAES:19 output (and my command line), had the order as:

  • Delta (prio 0)
  • BCJ (prio 1)
  • LZMA2 (prio 2)
  • LZMA (prio 3)

Are you saying higher prio come first?

Are you saying higher prio come first?

That seems to be the case, and then everything fits. I was confused because I'd say "priority one" for first/highest priority and since this is zero-based I assumed "priority 0" would be first. That, and the -m0 .. -m3 options to 7z reflects my view of it.

But if prio 0 is actually the last/lowest priority and they come in reverse order, I think I'm set - thanks!

yeah @magnumripper , you are totally right.

maybe we should be even more clear in the documentation about this and add some further comments about this ...
but I think the examples are quite understandable (now we have at least 2 complete examples with comments added, very good) and the problem in the naming of "order"/"priority" actually just is (in my opinion) that the archive generation (7-Zip "file compression") is just doing it the opposite way (the AES decryption comes last) while we need to do it the other way around (AES decryption comes first).... so the naming always can be confusing of course.... 7z2hashcat of course just picks the information from the 7z archive file and puts it into the hash line... so 7z2hashcat basically just takes that order it gets from the meta data and puts it into the hash line (no changes/manipulation here).

I will think about improving the documentation a little bit... but at the end it won't really matter too much for the user, it's more of a technical detail about the hash format that only 7z2john and 7z2hashcat and the crackers would need to know (what is really needed for processing the data correctly)...

btw: did you notice that 7z2hashcat/7z2john will also tell you what needs to be done in case the crackers (john/hashcat) do not support it (and therefore $PASSWORD_RECOVERY_TOOL_SUPPORT_MULTIPLE_DECOMPRESSORS and $PASSWORD_RECOVERY_TOOL_SUPPORT_MULTIPLE_PREPROCESSORS are set to 0 ; and maybe also the coders/filters themself are empty @PASSWORD_RECOVERY_TOOL_SUPPORTED_PREPROCESSORS and @PASSWORD_RECOVERY_TOOL_SUPPORTED_DECOMPRESSORS) ? It gives a nice explanation of what the crackers are not yet able to do and how exactly the order is !

Yeah with this confirmed and the existing examples, we're fine. And you're totally right a cracker need to reverse the prios anyway.

I'll continue implementing it in JtR soon. Thanks!

Ouch, I just realized a huge caveat: We can't have : within ciphertexts in JtR because they are the very field limiter between ciphertext and other fields (metadata such as file names, which frequently contain password hints). It's possible to work around it but it's more or less unacceptable because it ruins the chance of including valuable metadata. Would it be possible to change this before the ciphertext format settles in hashcat?

I will change it to _. i.e. _ instead of :