/sgrep

polyglot AST pattern matching

Primary LanguageOCamlGNU Lesser General Public License v2.1LGPL-2.1

sgrep

r2c community slack

sgrep, for syntactical (and occasionnally semantic) grep, is a tool to help find bugs by specifying code patterns using a familiar syntax. The idea is to mix the convenience of grep with the correctness and precision of a compiler frontend.

Quick Examples

patternwill match code like
$X == $Xif (node.id == node.id): ...
foo(kwd1=1, kwd2=2, ...)foo(kwd2=2, kwd1=1, kwd3=3)
subprocess.open(...)import subprocess as s; s.open(['foo'])

Supported Languages

JavaScript Python Go Java C Ruby Scala
coming coming

Meetups

Want to learn more about sgrep? Check out these slides from the r2c February meetup

Installation

Docker

sgrep is packaged within a docker container, making installation as easy as installing docker.

Quickstart

docker pull returntocorp/sgrep

cd /path/to/repo
# generate a template config file
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --generate-config

# look for findings
docker run --rm -v $(pwd):/home/repo returntocorp/sgrep

Usage

Rule Development

To rapidly iterate on a single pattern, you can test on a single file or folder. For example,

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep -e '$X == $X' path/to/file.py

Here, sgrep will search the target with the pattern $X == $X (which is a stupid equals check) and print the results to stdout. This also works for directories and will skip the file if parsing fails. You can specifiy the language of the pattern with --lang javascript for example.

To see more options

docker run --rm -v $(pwd):/home/repo returntocorp/sgrep --help

Config Files

Format

See config.md for example configuration files and details on the syntax.

sgrep Registry

r2c provides a registry of config files tuned using our analysis platform on thousands of repositories. To use:

sgrep --config r2c

Default

Default configs are loaded from .sgrep.yml or multiple files matching .sgrep/**/*.yml and can be overridden by using --config <file|folder|yaml_url|tarball_url|registy_name>

Design

Sgrep has a design philosophy that emphasizes simplicity and a single pattern being as expressive as possible:

  1. Use concrete code syntax: easy to learn
  2. Metavariables ($X): abstract away code
  3. '...' operator: abstract away sequences
  4. Knows about code equivalences: one pattern can match many equivalent variations on the code
  5. Less is more: abstract away additional details

Patterns

Patterns are snippets of code with variables and other operators that will be parsed into an AST for that langauge and will be used to search for that pattern in code. See patterns.md for full documentation.

Metavariables

$X, $FOO, $RETURN_CODE are all examples of metavariables and you can referance them later in your pattern and sgrep will ensure they match

Operators

... is the primary "match anything" operator

Equivalences

sgrep automatically searches for code that is semantically equivalent. For example, a pattern for

subprocess.open(...)

will match

from subprocess import open as
 sub_open
result = sub_open(“ls”)

and other semantically equivalent configurations.

Integrations

See integrations.md

Bug Reports

Reports are welcome! Please open an github issue on this project.

Contributions

sgrep is LGPL-licensed and we would love your contributions. See docs/development.md