Incorrect Hash Outputted on Certain Files
tworeimage opened this issue · 4 comments
Thanks for the great work on this library! One of the issues that I'm seeing is that when I run this implementation on a malicious file, I'm seeing slightly different results than what I see in VirusTotal. I've also compiled the official SSDEEP implementation, and they also show the same result as what VT shows.
This Implementation: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1:o45AgJUEpV2NLW4GdlakpZ8Oda
Virustotal: 96:o8kUse54dWD+Kmu2+GOWemu2+GOWemu2+GOWemuDJvNSt+pV2NLiOw4GdlopXh1r:o45AgJUEpV2NLW4GdlakpZ8Oda
The subtle difference is that the first part of the hash is missing an 'r' at the end of it. I have been debugging this for about two hours, but I can't see any obvious bug occurring, so I won't be able to submit a PR at this time.
I suspect that it might be the way that the blockSize
variable is calculated, but that's just a hunch. I tried a bunch of stuff to see if I could fix it but none of it worked.
Attached Zip with password of "infected"
5403252175699968.zip This is a malicious file so please do not execute it. (Malicious VBA script)
Thanks for bringing this up. Mismatch in signature is definitely a bug. IIRC I have seen issues like this before. Could be related how remaining data is handled which doesn't fit into a block 🤔
@tworeimage if you want to get started on solving this issue, create a test for this case (which should fail now obviously). Then look into how we decided when to finish the hash. Then compare this to the original SSDEEP implementation.
@tworeimage did you had a chance to give this a try?