Incredibly Formal And In Depth Research Project Completed Under The Direction Of The Illustrious James Mickens In Order To Further Our Universal Understanding Of Dynamic Taint Analysis. Copyright CS 263: Security Systems at Harvard College. All Rights Reserved.
brew install llvm
-
Install:
pip install virtualenv
-
Create:
virtualenv venv
-
Activate:
source venv/bin/activate
-
Deactivate:
deactivate
pip install -r requirements.txt
sh gen_riscv.sh file.c
If you get the error "llc: command not found" on OSX,
you can probably fix it by adding /usr/local/opt/llvm/bin/
to your path:
echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
python interpreter.py riscv_file program_args
Where riscv_file
is the generated RISC-V assembly file.
Pickle files will be automatically generated in the folder pickle_cabinet
.
parser.py
Parses RISC binary. Tokenizes instruction-bearing lines. Stores file lines in '_data'. Stores tokenized instructions in '_instructions'.
interpreter.py
Interpreter parses a RISCV file.
Executes the binary, tracking taint according to the dynamic policy (default policy is in policy.py)
- main - handles arguments, sets up pickling, initializes interpreter, sets up policy
- class RiscvInterpreter - Runs the program instruction by instruction, taking snapshots regularly.
state.py
Holds registers and memory. Converts instructions in blocks dictionary into instruction objects.
- ABI_TO_REGISTER_IDX - Maps RISCV instruction names to enumerations.
- class RiscvState - State metadata, an array for memory, and a dictionary for register state.
instruction.py
Defines an object representation for RISC-V instructions.
- class RiscvInstr - Parses tokens and creates an object for executing instructions.
- class RiscvOperand - Represents an abstract righthand side operand. Stores information relevant to classifying operand types (mem refs, consts, regs).
- class MemoryReference - Abstraction for representing mem reference operands.
taint.py
Defines object representations necessary to track and propagate taint.
- class TaintTracker - provides taint tracking abstractions for instruction-level tracking. For each instruction encountered, propagates taint based on the user provided taint policy. Maintains shadow memory and shadow registers, which correspond to regs/mem in interpreter state.
backtrack.py
A proof of concept showing our taint tracking interpreter is capable of uploading snapshots and executing them easily. This feature is essential for a dynamic taint tracking system, allowing the potential for increasing/decreasing taint policies mid-program execution.
- Example Execution: python backtrack.py --pickle_path=
policy.py
Defines the developer's taint propagation policy. A policy is a mapping of RISC-V instruction
names as strings to a handler.
A handler is a function of 3 arguments, the taint tracker (defined in taint.py), the state of the interpreter (defined in state.py), and the operands object (defined in instruction.py)
analyze.py
Provides abstractions for plotting the change in register/memory taint across the execution of the program. Outputs generated graphs to the directory <pickle_jar_path>/data/ --memory_graph and --register_graph flags determine which graphs to generate.
-
class Analyzer - Uploads the snapshotted state from the specified pickle_jar.
-
Example Execution: analyzer.py --pickle_jar=<pickle_jar_path> --memory_graph --register_graph