Add support for self-extracting 7-Zip archive files
kholia opened this issue · 9 comments
http://openwall.info/wiki/_media/john/test-openwall-sfx.tar
The password for this SFX sample is openwall
.
Simply extracting the contents starting with magic string 7z
into a separate file does not work.
This problem remains with our new 7z2john.pl. If we fix it, we should submit it upstream to 7z2hashcat
It is actually quite easy to extract the .7z from a self-extracting archive.
Here are the steps:
$ # download it:
$ wget http://openwall.info/wiki/_media/john/test-openwall-sfx.tar
$ # untar it:
$ tar xf test-openwall-sfx.tar
$ # use binwalk to get offsets (or use -e to extract it, sometimes -e just doesn't seem to work)
$ binwalk test-openwall-sfx.exe
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
...
190976 0x2EA00 7-zip archive data, version 0.4
$ # extract the .7z by skipping the start of the executable wrapper (the offset above):
$ dd if=test-openwall-sfx.exe of=extracted.7z bs=1 skip=190976
$ # is equivalent to: dd if=test-openwall-sfx.exe of=extracted.7z bs=190976 skip=1
$ # extract the "hash":
$ 7z2john.pl extracted.7z > hash.txt
$ # just verify that the file is not empty (i.e. that the "hash" was correctly extracted):
$ cat hash.txt
extracted:$7z$0$19$0$$8$9221bf8fe1f814c90000000000000000$1731540556$112$106$0df4207f74f81459f9c714e0dd6252ce7e1d193957c92ed3fd424b7177295ed93a666b289f12b179281f9ace5b7ea44fe9039a78b3431e4558daf8dabf028e07645ffa112aa709fcf319c80f1dd16716075da6bba7c95b07f0c13f4a71df54acb239bc3b765a4117213b717aca3dbd36
$ # crack it with jtr:
$ john hash.txt --mask=openwall
...
openwall (extracted)
...
$ # extra step (not required) if you really want too:
$ # (we can only skip this step because of how .7z files are structured/read
$ # i.e. we can just ignore any trailing bytes after the actual archive file)
$ xxd -g 1 extracted.7z | head -n 3
0000000: 37 7a bc af 27 1c 00 04 5d ab 97 52 90 00 00 00 7z..'...]..R....
0000010: 00 00 00 00 26 00 00 00 00 00 00 00 a1 ae b2 a4 ....&...........
0000020: 40 11 5a a7 23 fc cd e2 8d cf 8d c6 82 01 f1 08 @.Z.#...........
$ # from the 7z signature header (the few lines above) we know:
$ # 1. length of 7z signature/header (32 bytes)
$ # 2. offset for the data/stream header (at end of file): 0x0000000000000090 (144 bytes)
$ # 3. size of that header: 0x0000000000000026 (38 bytes)
$ # calculate the total .7z file size with this formula: 32 + 144 + 38 = 214 bytes
$ # note: you could just use 7z2john.pl to extract the header info by changing/debugging it like this:
$ # my $signature = read_seven_zip_signature_header ($fp);
$ # + print Dumper ($signature);
$ # extract the .7z with correct length:
$ dd if=test-openwall-sfx.exe of=extracted.7z bs=1 skip=190976 count=$((32 + 144 + 38))
This of couse could all be automated (the only difficult thing probably would be to find the offset and validate if it is not a false positive, i.e. implementing the logic that binwalk already is able to do).
Perhaps 7z2john (7z2hashcat) could simply scan for the 7z magic '7', 'z', 0xBC, 0xAF, 0x27, 0x1C (377abcaf271c
). The chance of a false positive is small and could be verified when trying to parse the 7z data.
yeah, I agree that those 6 bytes are kind of "long enough" to eliminate the risk of too many false positives and 7z2john.pl could in theory already be enough to reject the remaining non-valid 7z archives. This should be pretty easy to test and confirm.
Perhaps false positives can be eliminated even further in case the 7z segment is always aligned to some boundary (in your example it was 0x2EA00)? I'm not really acquainted with Windows executables.
Yeah.
Me neither. I just can confirm that what the code of p7zip and 7-Zip does is also pretty straightforward (or let's say not very advanced, naive, stupid). See https://github.com/jbdemonte/p7zip/blob/master/CPP/7zip/UI/Common/OpenArchive.cpp#L2496 . They loop over the byteBuffer (buf) one byte at a time looking up every combination of 2 bytes in a hash table to see if that pair is the start of a known signature... if so, some further tests are performed and eventually the corresponding Open () function (with the correct offset/starts into the buffer) is called.
To be fair, of course p7zip has the disadvantage of having to deal with much more signatures/file formats and therefore a naive/general approach that works for all the supported file extensions/signatures was preferred.
I would say in this case we can't really rely on what p7zip and 7z do. Anyway, I think it was worth the time (well, it didn't take that long at all) to see if they have a more advanced "scanning technique" to find the start of .7z.
I agree, looking up how the PE (windows executable) structure can help us to reduce false positives + allow a faster search, might be a good next step.
update: I'm also still not sure if reading sfx files should be supported by 7z2hashcat itself. It's something special (and maybe not the most important new/requested feature) and furthermore it's actually a different format (and therefore maybe something like "sfx2hashcat"/"sfx2john" would be more meaningful as a name for the tool).
Well, "sfx2hashcat" is also not the perfect name since self-extracting executables could in theory contain any other information too (also very different from 7z files) ...
I'm also still not sure if reading sfx files should be supported by 7z2hashcat itself.
If it can be done with ease I see no reason not to do it. If it turns up hard or hairy, I'm 100% fine with simply documenting that binwalk exercise and just leave it at that.
Like you say, it's special. Actually now that I think about it: While I've seen a whole lot of self-extracting archives I'm pretty sure I have NEVER seen one that also had encryption!
the newest version of 7z2hashcat adds support for .sfx files.
Cool, thanks!