sgrep
, for syntactical (and occasionnally semantic) grep, is a tool to help find bugs by specifying code patterns using a familiar
syntax. The idea is to mix the convenience of grep with the correctness and precision of a compiler frontend.
pattern | will match code like |
$X == $X | if (node.id == node.id): ... |
foo(kwd1=1, kwd2=2, ...) | foo(kwd2=2, kwd1=1, kwd3=3) |
subprocess.open(...) | import subprocess as s; s.open(['foo']) |
JavaScript | Python | Go | Java | C | Ruby | Scala |
✅ | ✅ | ✅ | ✅ | ✅ | coming | coming |
Want to learn more about sgrep? Check out these slides from the r2c February meetup
sgrep
is packaged within a docker container, making installation as easy as installing docker.
docker pull returntocorp/sgrep
cd /path/to/repo
# generate a template config file
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --generate-config
# look for findings
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep
To rapidly iterate on a single pattern, you can test on a single file or folder. For example,
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep -e '$X == $X' path/to/file.py
Here, sgrep
will search the target with the pattern $X == $X
(which is a stupid equals check) and print the results to stdout
. This also works for directories and will skip the file if parsing fails. You can specifiy the language of the pattern with --lang javascript
for example.
To see more options
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --help
See config.md for example configuration files and details on the syntax.
r2c provides a registry of config files tuned using our analysis platform on thousands of repositories. To use:
sgrep --config r2c
Default configs are loaded from .sgrep.yml
or multiple files matching .sgrep/**/*.yml
and can be overridden by using --config <file|folder|yaml_url|tarball_url|registy_name>
Sgrep has a design philosophy that emphasizes simplicity and a single pattern being as expressive as possible:
- Use concrete code syntax: easy to learn
- Metavariables ($X): abstract away code
- '...' operator: abstract away sequences
- Knows about code equivalences: one pattern can match many equivalent variations on the code
- Less is more: abstract away additional details
Patterns are snippets of code with variables and other operators that will be parsed into an AST for that langauge and will be used to search for that pattern in code. See patterns.md for full documentation.
$X
, $FOO
, $RETURN_CODE
are all examples of metavariables and you can referance them later in your pattern and sgrep
will ensure they match
...
is the primary "match anything" operator
sgrep
automatically searches for code that is semantically equivalent. For example, a pattern for
subprocess.open(...)
will match
from subprocess import open as
sub_open
result = sub_open(“ls”)
and other semantically equivalent configurations.
See integrations.md
Reports are welcome! Please open an github issue on this project.
sgrep
is LGPL-licensed and we would love your contributions. See docs/development.md