Handling of truncated hashes (padding attack)
magnumripper opened this issue · 10 comments
In my opinion, the truncation should happen in the cracker, not in 7z2hashcat. This way we can get rid of false positives completely:
- 7z2hashcat outputs full data regardless of padding size.
- Cracker loads it and use padding attack when applicable.
- Only when padding attack passes, cracker continues with slower deflate and CRC of full data.
I kind of agree with you that the cracker could detect this and should decide what would be best to crack those hashes.
On the other hand, the idea was that we shouldn't output huge "hashes" (yes, some 7z files can be extremely large and therefore also the data needed by the cracker) whenever we know that we don't need those large blobs at all. Hashcat even enforces a strict limit on the maximum length of a 7z "hash" line.
Maybe a compromise would be okay by saying that only if SEVEN_ZIP_MAX_DATA (or actually the variable was called $SEVEN_ZIP_HASHCAT_MAX_DATA) was reached, than 7z2hashcat should try to "truncate" it.
Since 7z2john.pl doesn't use this variable at all (or maybe will set it to a much higher limit, some limit is always good I think) than it wouldn't affect the output for jtr too much. Do you agree?
So my suggestion is to rename SEVEN_ZIP_HASHCAT_MAX_DATA to SEVEN_ZIP_MAX_DATA and use this variable to decide if we "can and must" truncate or not (and maybe whenever the truncation is done it should show a info/warning).
What do you think about this suggestion?
+1. I'm definitely a fan of crackers directly supporting partial hashes in their various common forms.
https://roycebits.blogspot.com/2015/10/hash-filtering-more-than-vanity.html
So my suggestion is to rename SEVEN_ZIP_HASHCAT_MAX_DATA to SEVEN_ZIP_MAX_DATA and use this variable to decide if we "can and must" truncate or not (and maybe whenever the truncation is done it should show a info/warning).
What do you think about this suggestion?
Fair enough. But there's another detail involved. We need to use another flag saying it's truncated. So $7z$4
(actually just bit 2 in the flag) could say "truncated hash" in order for cracker to know it can't inflate LZMA nor verify CRC after padding check passed.
This boils down to the following situations for the cracker:
- A truncated hash given as input. Cracker can only check padding. We may get false positives due to incorrect decrypted data randomly ending in correct number of padding bytes.
- A full hash given as input, where cracker can use the padding trick. Cracker should inflate (if applicable) and verify CRC whenever padding matched. In most cases there will be no false positives - eg. with 4 bytes of padding plus 4 bytes of CRC, chance of FP should be really slim. Also, the LZMA inflate (when used) will very likely fail for incorrect data and reduce FP.
- A full hash where cracker can not use the padding trick (ie. block size happened to be a multiple of 16). Cracker must inflate (if applicable) and verify CRC. We can only get a false positive due to CRC collisions - one in 4G - but again the LZMA inflate (when used) will very likely fail for incorrect data and reduce FP.
This way, for case 2, cracker can and should use "padding trick" as long as there was at least one byte of padding. This can be a significant speedup so it's a good thing.
Thanks @magnumripper. Very good points and good explanation of the steps involved.
I'm just not sure if we should use bit 2 for this flag. In theory, the 7z format allows some more compression algorithms (see http://www.7-zip.org/7z.html) even if they are very, very rarely used in the wild and I am not sure if header compression is also supported with all of this compression algorithms.
I would therefore suggest, to allow space for future use, to use after
- 0: no truncation, no compression
- 1: no truncation, LZMA1 compression
- 2: no truncation, LZMA2 compression
- 128: truncated, no compression
- 129: truncated, LZMA1 compression
- 130: truncated, LZMA2 compression
(of course this allows some space for future use and checking for truncation should be as simple as using a logical AND with 0x80, e.g value & 0x80)
What do you think about this slightly different numbering mechanism?
Good idea, use bit 7. Another question is whether to print the type field as decimal or hex. We seem to use decimal for the others so perhaps just go with that.
I'm not sure whether we should use it as a flag or not. 129 would mean "truncation + LZMA" and then we'd need to add the extra fields. But in that case the "LZMA" is not applicable anyway so the extra fields are of no use. So we could just as well say "128" and not include them. Since we also can't mix compression types, the field could be non-flag "type" as opposed to "type flags" (and eg. handled with switch/case in cracker).
None of this is critical, just decide whatever you prefer. I have all code for JtR ready to push (after testing) once you implement this.
It would be great if you could help me verify if the newest version and especially commit 8ef8f21 fixed this github issue too.
Thank you very much @magnumripper
Excellent, everything seems to work fine now. I have committed LZMA and LZMA2 support to Jumbo now and I could probably start implementing support for more decoders right away with no more changes to 7z2john.
BTW JtR uses $PASSWORD_RECOVERY_TOOL_DATA_LIMIT = 0x80000000
(as in 2 GB). It does the "last 16 bytes AES" trick by itself first (if applicable) and if that passes it still won't AES-decrypt more than crc_length * 1.1
(this should be more than worst case of uncompressable data) before LZMA or LZMA2 decompression. This is sometimes just a FRACTION of total data. The LZMA decompression gives us another chance of semi-early rejection.
All in all, for some archives we've had quite an increase in speed (not to mention getting rid of false positives AND false negatives 😵 )
Good to here that. I think we solved all of the known problems with these changes and it's good that you already can confirm that.
Thanks again for the help and good collaboration to find a solution for both fixing/improving 7z2hashcat.pl/7z2john.pl and discussing on how we best implement this with hashcat/jtr (e.g. the LzmaDecode () and Lzma2Decode () calls).