philsmd/7z2hashcat

Extraction doesn't work on some 7z files

Closed this issue · 6 comments

I think I found a bug on 7z2hashcat (the behaviour with 7z2john is the same).
Here are the commands to reproduce the problem:

7z a -pPassword encrypted.7z somefile.dat
    # encrypted.7z = 1,6 MB

7z2hashcat.pl encrypted.7z > 7z.hash
    # 7z.hash = 3,3 MB -- this is too large

hashcat --identify 7z.hash
    No hash-mode matches the structure of the input hash.

hashcat -a3 -m11600 7z.hash Password
    Hashfile '7z.hash' on line 1 ($7z$2$...3c58886c8fcc9bba4982$14942208$18): Salt-value exception
    No hashes loaded.

However, if the 7z command is issued this way, it works:

7z a -pPassword -mx=9 -mhe=on -t7z encrypted2.7z SYSTEM

7z2hashcat.pl encrypted2.7z > 7z2.hash
    # 7z2.hash = 291 Bytes

hashcat --identify 7z2.hash
    The following hash-mode match the structure of your input hash:
        11600 | 7-Zip

hashcat -a3 -m11600 7z2.hash Password
    Status...........: Cracked

The 7zoptions mean:

-mx=9: Set compression level to maximum (I think it's optional)
-mhe=on: Enable header encryption
-t7z: Set the archive type to 7z

Are these options mandatory for 7z2hashcat to work correctly?

@mfrade
Oh this seems to be an interesting case.

BTW: I've pushed a new version today that fixed a problem with some indices (but is rather related to multiple files, not just a single encrypted file).

I think you triggered a case where actually hashcat (not 7z2hashcat) is somehow confused by the input and therefore refuses to run (we had some other similar cases where some filters/rules/sanitizing checks were just too strict in hashcat).

The problem is without having this same file you have, it's just a guess by me.

Best thing would be if you can share the file or create a similar file that behaves exactly the same.

Other things you could do is to play around with some 7z parameters while compressing (yeah compression option might be the most important one for this problem to trigger, not the file structure encoding with -mhe=on, because this always creates a small output/hash) and smaller input data, i.e. make the file shorter and shorter until it works etc.

Yet another option would be to play around with hashcat code to see where it rejects the hash around this line in code https://github.com/hashcat/hashcat/blob/fafb277e0736a45775fbcadc1ca5caf0db07a308/src/modules/module_11600.c#L596 , but of course you need to make sure you compile hashcat and the -m 11600 module again and again (and of course correctly) if you want to debug like this to see where to problem exactly is (i.e. find out with some printf ("HERE\n"); how far it runs into that code before rejecting the whole hash).

I hope this helps you @mfrade and we will find a solution for your problem.

Anyway thanks for reporting (but without much more info on how to reproduce and, better yet, all the raw input files and the final 7z file, I can't do much now).

Thank you for your reply, @philsmd.
I can't share the file that originated the error, so I tried replicating the issue with a small random file (1 and 2 MB), but the bug didn't reveal itself. However, if the random file has the same size as my original file, the problem will appear:

dd if=/dev/urandom of=somefile.dat bs=14942208 count=1

I tried this procedure 3 times, always with the same result:

hashcat --identify 7z.hash                 
No hash-mode matches the structure of the input hash.

hmm, yeah your case is very specific...

of course with random data it's not impossible that we hit some limits (it can't be compressed as "usual" files).

That said, I tried to run the commands that you suggested but always failed to generate a hash with this message by 7z2hashcat:

WARNING: the file 'a.7z' unfortunately can't be used with hashcat since the data length
in this particular case is too long (14943008 of the maximum allowed 8388608 bytes).

Are you always using -mx=9 with the 7z command ? (it's also debatable if this -mx command will help alot here, because random data can't really be compressed (much) better, just by specifying more and more command line option).

I will continue to see if I can reproduce it, but with the dd command and 7z a -mx=9 (and also without it), I always get a error message that it's too huge.

Are you sure you use the latest version of 7z2hashcat and not a modified version that increases the limits (I think 7z2john has slightly larger limits, just because john has not the exact limits in code compared to the hashcat cracker).

I.e. please try to double check that if you use a very large file, that 7z2hashcat shows you exactly this string

of the maximum allowed 8388608 bytes

Thanks

Are you always using -mx=9 with the 7z command ? (it's also debatable if this -mx command will help alot here, because random data can't really be compressed (much) better, just by specifying more and more command line option).

No, I didn't use the -mx=9 option. Here are all the steps, now with a file with only zeros:

# create a file with zeros (to be reproducible):
dd if=/dev/zero of=somefile.dat bs=14942208 count=1

# compress with password:
7z a -pPassword encrypted.7z somefile.dat

# extract hash:
7z2hashcat.pl encrypted.7z > 7z.hash
ATTENTION: the hashes might contain sensitive encrypted data. Be careful when sharing or posting these hashes

# identify:
hashcat --identify 7z.hash     
No hash mode matches the structure of the input hash.

# try hashcat with the right mode:
hashcat -a3 -m11600 7z.hash Password
Hashfile '7z.hash' on line 1 ($7z$2$...04c25219dc4f73f37cda$14942208$18): Salt-value exception
No hashes loaded.

When I reported the error, I was using version 1.9, but I got the same results with the new version of 7z2hashcat:

# version:
# 2.0

# date last updated:
# April 10 2024

Additional details:

  • OS: kubuntu 22.04
  • Kernel: 6.5.0-27-generic
  • Python: 3.10.12
  • hashcat: v6.2.6, compiled from source code (commit = 5def8a3)

Hey @mfrade,

I don't think you can achieve such a good compression rate with 15MB of random data, probably you have used 7z2john.pl or modified the variable $PASSWORD_RECOVERY_TOOL_DATA_LIMIT in the 7z2hashcat.pl source code (see https://github.com/philsmd/7z2hashcat/blob/643939c79ce19bcef589f0ab698a3cd5ada7e975/7z2hashcat.pl#L155 versus
https://github.com/openwall/john/blob/c042fa3e31217a96160dd9f35762dbbea57bdc56/run/7z2john.pl#L155, i.e. 16MiB vs 2GiB in hexadecimal form and 8MiB vs 1 GiB binary data).
update: I didn't even see that you have now changed it to /dev/zero, well that is of course highly compressible. I't not very realistic and doesn't represent normal use cases, I think, but maybe it proofs that you are decompressing a data stream similar to a problem I've found out myself, mentioned hereafter:


I've tested a lot with some variations of your command and I've found out that it's difficult to reproduce a problem with random data (e.g. dd if=/dev/urandom of=somefile.dat bs=$((8000 * 1024)) count=1, which should be close to the 8MB limit that is also used as max for hash lines in hashcat, 8MiB = 8192KB).

On the other hand, there is a small chance that you are triggering a different problem (but of course I can't be sure if what I've discovered here is the same issue that you are experiencing) with both very large and highly compressed data with a variation of your command (but of course by using at least one compression algorithms/coders and most importantly *highly compressible data):

  1. head -c 10000000 ~/dicts/rockyou.txt > /tmp/t.txt
  2. rm -f /tmp/a.7z; 7z a -p1 -m0=LZMA /tmp/a.7z /tmp/t.txt
  3. ./7z2hashcat.pl /tmp/a.7z > /tmp/hash.txt
  4. tail -c 32 /tmp/hash.txt (display the crc_len field and coder attributes)
  5. hashcat -m 11600 -a 3 --potfile-disable -o /dev/null /tmp/hash.txt ?d

Note: instead of byte values for head (i.e. -c 10000000) you could also use line values (i.e. -n 1200000) in command 1:
head -n 1200000 ~/dicts/rockyou.txt > /tmp/t.txt

Note: in general, in command 2 we could also use LZMA2 (-m0=LZMA2) or just let 7z itself decide which compression algorith/coder is the most suitable/best one (i.e. removing the -m0=LZMA argument), but I've chosen to use always the same fixed compression algorithm here, only to make the problem even easier reproducible.

This triggers a sanity check in hashcat that is responsible to make sure the unpack buffers (i.e. the decompressed data buffers) don't get too huge to overflow (yeah, compressed vs decompressed lengths can vary A LOT with LZMA/LZMA2 compression/coders). The current limit in hashcat code is 9999999 bytes (i.e. about 10MB), the number of digits that number has is seven "9" digits (you will soon understand why this fact is important).

It's because we have the rejection check in hashcat for crc_len_len > 7 (i.e. the field crc_len could not have a length larger than 7, therefore it's called crc_len_len, i.e. the length of the crc length. I know it sounds a little bit strange, but it's because hashcat always deals with hash format fields separated with either $ or * or similar, and hashcat has to do a lot of hash input line sanity checks on these "fields" and in this particular case a field is already called "crc_len" and now we double check the length of this particular field).

Therefore my conclusion, if this is really the same problem that you are dealing with, is that it's important that you show this for your original hash file:
tail -c 32 hash.txt

(I think in your case it was 14942208 bytes, therefore more than 7 digits long, i.e. almost 15MB and this could only be accepted by 7z2hashcat.pl because of the huge compression rate: less than 8 MB are getting expanded to 15 MB)

You could also just run 7z l encrypted.7z (on your original/important 7-Zip file) and see which compression type is used (decoders), e.g. LZMA/LZMA2 etc (of course this only works if header encryption -mhe was not used while generating the archive, otherwise you would need to know the password, but on the other hand that would also imply that the hashes would be much smaller, because in that case we would crack the "file list" encryption and not the data encryption itself):

in theory, you could also just show the first bytes of the original hash.txt (generated by 7z2hashcat with the original/important 7-Zip file as input) e.g.:
head -c 32 hash.txt

to see which coders (LZMA/LZMA2 etc) are involved in unpacking the compressed and encrypted data (either post the result here and/or compare it with the README.md or the explanation in 7z2hashcat.pl, search for data type indicator).

If you are triggering this sanity check with your specific hash file, we could assume that the compressed and encrypted data length is within the limit, but the decrypted and decompressed output length hits the max unpack size limit in hashcat:
UNPSIZE
and (as we have already seen above)
crc_len_len (length of the field of crc_len, i.e. how many digits the length of this field is)

i.e. this in the 7z2hashcat.pl source code:
[length of data for CRC32] # the length of the first "file" needed to verify the CRC32 checksum

So the solution would be to increase buffers in hashcat, such that you can continue to crack large hashes like the ones you are dealing with, with a patch similar to this one:

diff:

diff --git a/src/modules/module_11600.c b/src/modules/module_11600.c
index 7205f6b7e..503a3796c 100644
--- a/src/modules/module_11600.c
+++ b/src/modules/module_11600.c
@@ -133,7 +133,7 @@ bool module_hook_extra_param_init (MAYBE_UNUSED const hashconfig_t *hashconfig,
   seven_zip_hook_extra_t *seven_zip_hook_extra = (seven_zip_hook_extra_t *) hook_extra_param;
 
   #define AESSIZE 8 * 1024 * 1024
-  #define UNPSIZE 9999999
+  #define UNPSIZE 9999999 * 4
 
   seven_zip_hook_extra->aes = hccalloc (backend_ctx->backend_devices_cnt, sizeof (void *));
 
@@ -599,7 +599,9 @@ int module_hash_decode (MAYBE_UNUSED const hashconfig_t *hashconfig, MAYBE_UNUSE
 
   if (is_compressed == true)
   {
-    if (crc_len_len > 7) return (PARSER_SALT_VALUE);
+    if (crc_len_len > 8) return (PARSER_SALT_VALUE);
+
+    if (crc_len > 9999999 * 4) return (PARSER_SALT_VALUE);
 
     if (coder_attributes_len > 10) return (PARSER_SALT_VALUE);
 

i.e. increase UNPSIZE (for instance quadruple it, x = x * 4) in hashcat's -m 11600 module hook-setup phase (function module_hook_extra_param_init ()) and also increase the hash line parsing limit (crc_len and crc_len_len) in the module_hash_decode () function, also in src/modules/module_11600.c (after changing these files you need to run make).

My observation is therefore, that the compressed data is already very close to the data limit and then we also have to add to this data length the extra memory space (RAM) for the difference of decompressed vs compressed data (i.e. highly compressed and therefore the data expands a lot).

Our test tries this by increasing it to almost 4 * 10MB = 40MB (per thread !), i.e. on a 32 core processor system it is 32 * 40MB = 1280MB = 1.3GB for the "unpack buffer" alone (and we have other buffers too, like the decrypted data buffers for AES decryption etc).

Could you test it with these source code changes in hashcat ?

What does this mean for 7z2hashcat and should we change anything here? I would say that yeah there is always the chance for improvement... we could add an additional check in 7z2hashcat.pl that the crc_len can't be more than 9999999 and always keep this in sync with the fixed value in the hashcat source code (e.g. introduce in 7z2hashcat.pl a new variable like PASSWORD_RECOVERY_TOOL_UNPACK_LIMIT = 9999999, which currently does not exist).

There is also another limit that we could introduce that I've accidentally found out during that investigation and that is that the overall output length should be limited too, i.e. not only the data shouldn't be more than this $PASSWORD_RECOVERY_TOOL_DATA_LIMIT = 16 * 1024 * 1024, but also the whole line, called hash_buf, e.g. the return $hash_buf; in the 7z2hashcat.pl code, shouldn't be more than the limit of buffers that hashcat has set in include/common.h with the define of HCBUFSIZ_LARGE i.e. #define HCBUFSIZ_LARGE 0x1000000.

This HCBUFSIZ_LARGE is basically a 16MiB limit that is used to restrict all hash lines that are used as input lines for hashcat.

In theory, a very specific problem could arise without having this additional check in the code: 7z2hashcat.pl could output a data length slightly below the $PASSWORD_RECOVERY_TOOL_DATA_LIMIT (which is 8MiB in raw and 16MiB in hexadecimal/non-binary representation, x = x * 2), but with all the metadata and coder attributes and lengths etc that 7z2hashcat.pl adds to the hexadecimal data (which of course is still the largest part of all the long hashes that it generates, the data part will almost always be the most significant one in terms of field length), we could hit the other limit and get errors like this in hashcat:
Oversized line detected! Truncated 100 bytes
or similar messages (the "100 bytes" output could of course change, depending on the whole line length).
update: I've now changed 7z2hashcat.pl souce code to at least don't trigger that specific hash line length problem, see b3b9c4a

What do you think? Could you test if what we found out here is at least a related problem?
It's of course difficult to proceed here without asking you to please help to find out if this is the problem that you are dealing with, by changing source code of hashcat (yeah, not 7z2hashcat.pl changes). Otherwise we do NOT know for sure if your hash is getting rejected by exactly this sanity check or a completely different one.

Do you think you are able to make source code changes with the diff that I've provided above (should be easy with just the git apply patch.diff or a patch command)?

At least we should start with the output of the head, tail and 7z l commands from above, run on the original file (coders e.g. LZMA vs LZMA2 or none, and compressed vs uncompressed data lengths are important here).

Thanks

As you can see in commit 26537e6 I've now introduced this upper limit (unpack size byte restriction for example of the cracking tool, in this case hashcat) to make it clear already by running 7z2hashcat that some 3rd party tools (like hashcat) introduce an upper byte size limit.

To increase the limit in hashcat, you would need to make an issue on that hashcat's github page and suggest this change/improvement.

Thank you for making me aware of this border case where 7z2hashcat did just generate the hashes even if hashcat mentions a "salt-length" problem.

Best,
Phil