Synthesizing black boxed library functions via symbolic execution and component based synthesis. This has been tested on Ubuntu 20.04 LTS.
In this project we provide a tool which synthesizes functions which can be called from a binary blob.
It takes the following input
- A binary
- Input examples for a specific library function call
- A wrapper to call our library function
And produces the following output
- Synthesized C functions in the form of SSA components
Suppose we wish to synthesize the following C function modify_stuff
, it also uses another function modify_stuff2
which the user is unaware of.
int modify_stuff2(int a){
return a + 1;
}
int modify_stuff(int a, int b){
a = modify_stuff2(a);
return 2 * modify_stuff2(a) + b;
}
The user will then generate input output examples by calling modify_stuff
on their own.
The user is assumed to know the input types and return types of the library function modify_stuff
An example text file will look like the following.
int32,52,int32,86,
int32,27,int32,95,
int32,3,int32,21,
...
This is a CSV where all even columns represent the input type, and the odd columns will represent the value.
After that the user will write an additional wrapper function for our synthesis engine to analyze. The input variables to the library call do not need to be initialized our tool will do that for the user during analysis.
#include "modify_stuff.h"
int main(){
int a;
int b;
return modify_stuff(a, b);
}
After compiling this wrapper, run main.py
.
It will prompt the user for the location of the examples, and the binary to analyze.
Synthesized functions will be placed in synth_engine/components.cpp
.
int synthed_4(int in_1){
int out_0 = int_add(1, in_1);
return out_0;
}
int synthed_5(int in_1,int in_2){
int out_0 = int_add(4, in_2);
int out_1 = int_add(in_1, out_0);
int out_2 = int_add(in_1, out_1);
return out_2;
}
We use the symbolic engine Triton (https://github.com/JonathanSalwan/Triton
).
Please follow the install instructions to install it on your system.
Additionally, please make sure to install the newest version of Capstone (https://github.com/aquynh/capstone
) which Triton depends on.
The version found on package managers may be out of date (apt's is outdated Ubuntu 20.04 LTS)
To load binaries into Triton we also use LIEF (https://github.com/lief-project/LIEF
).
Please follow the install instructions to install it on your system.
main.py
is the wrapper to all the components of the system.
This folder contains files relating to the equivalence checking of 2 binaries.
equivalence/equivalence_test.py
This folder contains files relating to the generation of examples from the binary we wish to synthesize functions from.
gen_ex/execution_info.py
: Stores the information of each function during executiongen_ex/func_extraction.py
: Contains the FunctionExtractor classgen_ex/io_example.py
: Contains a class representing an IO example for the binary we wish to synthesize functions from
This folder contains the files for the synth_engine
synth_engine/component_state.cpp/hpp
: Contains the ComponentState class which stores the different permuations of a component we wish to testsynth_engine/components.cpp/hpp
: Components and constants relating to how we treat componentssynth_engine/engine.cpp/hpp
: Contains the Engine class which is the synthesis engine and other functions such as example parsing.synth_engine/main.cpp
: The main level call to the Enginesynth_engine/synth_state.cpp/hpp
: Contains the SynthState class which represents a choice of components to use. This is then further used to execute the components and determine if they match the input output examples.
This folder contains some tests that were used in development. One can think of them as examples.