feature: automatically identify code removed previously for being malicious.
Opened this issue · 0 comments
AngeloD2022 commented
A couple ideas for approaching this (just spitballing, possible better solutions exist as well):
- taking a cryptographic hash of a file (language agnostic but inflexible to minor code changes)
- computing a locality-sensitive hash of the malicious file using opcode disassembly or AST features (python-specific)
- the similarity of another file to a known malicious hash could be taken using the Levinshtein distance of the hash of a file with a known malicious file's hash.
This would obviously require a database of some sort (and committing thereto malicious file hashes in response to reports).