/ast_js_fuzzing

astjsfuzz

Primary LanguagePython

* Summary
Language fuzzing is '''hard'''.  To hit the runtime effectively, language fuzzing must create complicated files that are both correct in terms of their syntax and semantics.  Mutation fuzzing doesn't work well for this and, grammar fuzzing takes lots of effort.  This project proposes to build a language fuzzer that works on the AST (abstract syntax trees) in an mutation style.  Example language will be JavaScript.  JavaScript is loosely typed and has lots of interesting uses.  There are 5 board stages envisioned.

Stage 1, parse JS files
Get corpus of input files (mozilla/webkit) and build ASTs of each.
Found a python module called slimit that provides ASTs of JavaScript.  It's not that great, but will totally work for us.

Stage 2, mate files
Swap AST elements between trees that of similar types (i.e. exprs <=> exprs, decls <=> decls, cond <=> cond).

Stage 3, mutate files
Walk new ASTs and replace strings/numbers from bad sets.

Stage 4, semantic fixup
Walk new ASTs and add global vars where missing.  Loop back to stage 1 as necessary.

Stage 5, Emit code from ASTs
Throw and instrument

* To do:
1) The AST representation is not complete.  Some JS files will not parse.
2) The semantic fixups need some work.  Sometimes get an undefined symbol during fuzzed runtime.
3) Sometimes the emit code will run into infinite recursion.

* Example
There is a simple sample program to get you started.  It will lex, parse, walk and print AST and emit code for the simple.js program in samples.  Just run:
  python ./src/printJSast.py ./samples/small.js

* Rhino example, outputs will be in ast_js_fuzzing/src/t
cd astJSFuzz/src
mkdir t
./astJSFuzz.py -s 5 -n 3 -f ../samples/jses/vanilla/ -d 0.05 -o t -m -v

* References
http://packages.python.org/slimit/