pypi/inspector

feature: automatically identify code removed previously for being malicious.

Opened this issue · 0 comments

A couple ideas for approaching this (just spitballing, possible better solutions exist as well):

  • taking a cryptographic hash of a file (language agnostic but inflexible to minor code changes)
  • computing a locality-sensitive hash of the malicious file using opcode disassembly or AST features (python-specific)
    • the similarity of another file to a known malicious hash could be taken using the Levinshtein distance of the hash of a file with a known malicious file's hash.

This would obviously require a database of some sort (and committing thereto malicious file hashes in response to reports).