/megagrep

Megagrep helps beginning a code review by searching for keywords in the code using "grep". It does not search for vulnerabilities directly but for places where you could manually find some.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

-------------------------------------------------------------------------------
                  ,__ __                                      
                 /|  |  |                                     
                  |  |  |   _   __,  __,   __,  ,_    _    _  
                  |  |  |  |/  /  | /  |  /  | /  |  |/  |/ \_
                  |  |  |_/|__/\_/|/\_/|_/\_/|/   |_/|__/|__/ 
                                 /|         /|          /|    
                                 \|         \|          \|    

-------------------------------------------------------------------------------

Megagrep helps beginning a code review in (almost) any language by searching for patterns in the code using "grep". It does not search for vulnerabilities directly but for places where you could manually find some.

Megagrep searches for patterns in the code that require to be investigated manually for security issues. It uses several search modes (keyword-based, strings or comments extraction) and outputs either detailed results or global information to discover the code and locate interesting pieces of code (most frequent keywords, files with the most results, etc.).

This is not really a security-focused static analysis tool. Patterns are intentionally not too restrictive and will probably trigger false positives, but to figure it out you have to... review all suspicious pieces of code :). In other words, Megagrep will give you locations (where you could find vulnerabilities), not vulnerabilities directly.

If you want a grep-based static analysis tool for direct vulnerability research, you can check Graudit.

GitHub release (latest by date) GitHub Help wanted Useful statistics

Demo GIF

Getting Started

Clone repository:

https://github.com/claire-lex/megagrep

(Optional) Install requirements to improve terminal output (nice text and colors):

pip install -r requirements.txt

Run with standard options and default dictionary on a directory:

python megagrep.py /path/to/my/code

Store results to a file:

python megagrep.py my_directory -f megagrep.out

Print help:

python megagrep.py -h

How to begin a code review with Megagrep

Tree and code discovery

When starting a code review, you often have a huge amount of code in a deep and messy directory tree, and you don't know where to start. You can use Megagrep to have a general idea with statistics (-S) of what the code contains, what you may find and where:

$> python -S -x "*.js"
[...]
--------------------------- Most frequent keywords ----------------------------
  1. login (21)
  2. passw*d (11)
  3. session (9)
  4. sql (8)
  5. auth* (4)
  6. upload (3)
  7. sha*1 (3)
  8. download (3)
  9. md5 (2)
 10. exec (2)
--------------------------- Files with most results ---------------------------
  1. /path/to/index.php (20)
  2. /path/to/classes/Login.php (14)
  3. /path/to/templates/fonts/fontawesome-webfont.svg (12)
  4. /path/to/templates/loginform.php (8)
  5. /path/to/config.php (3)
  6. /path/to/templates/logged.php (1)
  7. /path/to/templates/include/header.php (1)
  8. /path/to/templates/edit.php (1)
[...]

The ls mode (-L) helps you locate directories you may want to browse:

python megagrep -L
[...]
|-- .
|-- config.php
|   <  2 results | Top: sql, login >
|-- index.php
|   <  20 results | Top: login, session, auth* >
|-- classes
|   | -- Login.php
|   |    <  12 results | Top: sql, login, exec >
|-- templates
|   | -- logged.php
|   |    <  1 results | Top: md5 >
|   | -- loginform.php
|   |    <  4 results | Top: login, auth* >
|   | -- css
|   | -- fonts
|   | -- images
|   | -- include
|   |    | -- header.php
|   |    |    <  1 results | Top: query >
|   | -- js
[...]

This mode can be combined (though not necessarily relevant) with comment and string mode.

Keywords-based search

By default, Megagrep outputs a list of lines containing keywords from its own default dictionary.

$> python megagrep.py .
[...]
classes/Login.php:51: public static function checkAuth($bank_id, $password) { (auth*, passw*d)
classes/Login.php:52: $conn = new PDO(DB_DSN, DB_USERNAME, DB_PASSWORD); (passw*d)
classes/Login.php:53: $sql = "SELECT * FROM users WHERE bank_id='".$bank_id; (sql)
classes/Login.php:54: $sql = $sql."' AND password='".md5($password)."';"; (sql, md5, passw*d)
classes/Login.php:56: $st = $conn->prepare($sql); (sql)
classes/Login.php:57: $st->execute(); (exec)
[...]

The extended output option -e prints lines before and after matching line:

$> python megagrep.py -e .
[...]
-------------------------------------------------------------------------------
classes/Login.php:52: $conn = new PDO(DB_DSN, DB_USERNAME, DB_PASSWORD);
classes/Login.php:53: $sql = "SELECT * FROM users WHERE bank_id='".$bank_id; (sql)
classes/Login.php:54: $sql = $sql."' AND password='".md5($password)."';";
-------------------------------------------------------------------------------
classes/Login.php:53: $sql = "SELECT * FROM users WHERE bank_id='".$bank_id;
classes/Login.php:54: $sql = $sql."' AND password='".md5($password)."';"; (md5, passw*d, sql)
classes/Login.php:55: 
-------------------------------------------------------------------------------
classes/Login.php:55: 
classes/Login.php:56: $st = $conn->prepare($sql); (sql)
classes/Login.php:57: $st->execute();
-------------------------------------------------------------------------------
[...]

You can also search for keywords only in file names with option -N:

$>python megagrep.py -N
[...]
classes/Login.php:0: Login.php - /path/to/classes/Login.php (login)
templates/loginform.php:0: loginform.php - /path/to/templates/loginform.php (login)

To search in specific files, the include (-i) and exclude (-x) options can be used.

To search for custom keywords, option -w can be used (alone or combined with a dictionary):

$> python megagrep.py -w bad,wrong # Search for words "bad" and "wrong" only
$> python megagrep.py -w bad -d my_dict # Search for "bad" and the content of my_dict

Finally, you can use your own dictionary file with -d (see below for syntax examples) or use part of a dictionary with -l to select a category (ex: authentication).

Other search modes

Megagrep includes modes to extract strings (-T) and comments (-C) from the code.

$> python megagrep.py -T -i "*.php"
[...]
config.php:6: define("DB_DSN", "mysql:host=localhost;dbname=testo"); (mysql:host=localhost;dbname=testo, DB_DSN)
config.php:7: define("DB_USERNAME", "root"); (DB_USERNAME, root)
config.php:8: define("DB_PASSWORD", "P@$$w0rd"); (DB_PASSWORD, P@$$w0rd)
config.php:9: define("CLASS_PATH", "classes"); (classes, CLASS_PATH)
config.php:10: define("TEMPLATE_PATH", "templates"); (TEMPLATE_PATH, templates)
[...]

Comments mode will search for one-line comments starting with // and # and for C-style comments (/* ... */, but only on one line so far). One can also choose a custom tag to use with option -t.

$> python megagrep.py -C -t % -i "*.sty"
[...]
mybeamertheme.sty:3: % Requirement (Requirement)
mybeamertheme.sty:10: % TODO (TODO)
[...]

Print and save results

Megagrep outputs results as colored text to stdout if termcolor is installed, or as raw text. Results can also be printed as CSV with option -c.

$> python megagrep.py -c -i "*.php"
Filename,Line number,Line,Found,Status,Walkthrough,Full path
[...]
Login.php,51,public static function checkAuth($bank_id, $password) {,auth*,,,/path/to/classes/Login.php
Login.php,53,$sql = "SELECT * FROM users WHERE bank_id='".$bank_id;,sql,,,/path/to/classes/Login.php
Login.php,54,$sql = $sql."' AND password='".md5($password)."';";,sql|md5,,,/path/to/classes/Login.php
Login.php,56,$st = $conn->prepare($sql);,sql,,,/path/to/classes/Login.php
Login.php,57,$st->execute();,exec,,,/path/to/classes/Login.php
[...]

The "Status" and "Walkthrough" columns are empty, because they are meant to be used by the reviewer. For now you have to edit the code to remove them.

The option -f can be used to store results to a file (-c and -f can be combined).

Improve Megagrep results

Use and write dictionaries

You can use default dictionaries in dicts/ and also use your own:

python megagrep.py -d path/to/dictionary

A dictionary has the following format:

# Comment line

[list_name]
keyword1
keyword2 # Comment at the end
matching_keyw*d

Example:

# Global keywords

[authentication]
auth* # authentication and authorization stuff
login
passw*d
pwd
session
admin*

Notes on supported languages

Megagrep works for any type of language with a syntax and coding conventions that make use of words in a human language (mostly in english): the keywords used in Megagrep, even custom ones that users are likely to use, probably rely on objects' names (user-defined or built in languages, libraries and frameworks). For instance, searching for password is only relevant when the language or the developer has the ability to use such word in her code.

For languages that do not allow that (older or low-level languages), Megagrep default dictionaries are currently not suitable. However, it may work on some of these languages with appropriate custom dictionaries (for instance, by searching for dangerous sequences of language-defined keywords). Please share your experience with me if you find yourself in such a situation!

Coming someday

  • Add better dictionaries (help welcome!)
  • Improve "stat" mode content (ideas welcome!)
  • Detect multi-line C-style comments with option -C
  • Add direct regex support in dictionaries with prefix regex:
  • Export results as HTML or other formats (ideas welcome!)