This is an extension of the asm2vec
model (official paper here) and builds upon the unofficial asm2vec
Python implementation by Lancern to add support for LLVM IR.
Installation with Docker is preferred + easiest:
# build the docker image
docker build . -t llvm2vec
# start docker image with interactive shell
docker run -it -v $PWD:/home llvm2vec /bin/bash
# start nginx container to serve cfgs
docker run --name nginx \
-v $PWD/cfgs/:/www/data/ \
-v $PWD/nginx.conf:/etc/nginx/nginx.conf \
-p 8080:80 nginx
*assumes you are currently in the root directory of the repo.
Install the following
- Python3
- libgraphviz-dev
afterwards, install the following Python dependencies:
# --install-option(s) may differ depending on system
pip3 install pygraphviz --install-option="--include-path=/usr/include/graphviz" --install-option="--library-path=/usr/lib/x86_64-linux-gnu/graphviz"
pip3 install llvmlite networkx numpy sklearn seaborn matplotlib
# dl code + required submodules
git clone https://github.com/markgllin/llvm2vec.git
cd llvm2vec
git submodule init
# add asm2vec to python path
export PYTHONPATH=$PWD/asm2vec:$PYTHONPATH
# run the code
python3 main.py
python3 main.py
python3 -m tensorboard.main --logdir=/home/projections/ --bind_all
Lots of things:
improve TSNE plotting (i.e. add labels/colors etc.)- database to persist vectorized functions
determining function similarity via cosine similarity- proper pipeline for disassembling in LLVM IR w/ retdec and passing into llvm2vec (probably in a new repo)
Nice to haves:
- install python dependencies the proper way
- use venv(?)
- gui