PyBitShred is a Python reimplementation of BitShred, a tool for large-scale malware similarity analysis and clustering.
This is my semester project of fall 2023 at EURECOM, France. The project is supervised by Antonino Vitale and Prof. Simone Aonzo. We implemented the tool in Python to make it more accessible and easier to use. We also added two new modes to the tool: the "all section" mode and the "raw file" mode.
Check the presentation slides here.
pip install pybitshred
Use the following command to see the available options:
pybitshred -h
There are three stages in the BitShred pipeline: update, compare, and cluster. The update stage is used to update the database with new samples. The compare stage is used to compare the samples in the database. The cluster stage is used to cluster the samples in the database.
Check the original paper for more details.
@im-overlord04 wrote a Dockerfile
to run the original BitShred tool in a container. This is useful as a reference to compare the results of the original tool with the results of the reimplemented tool.
FROM ubuntu:14.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install --no-install-recommends -y \
wget git automake libtool make cmake gcc g++ pkg-config libmagic-dev \
tar unzip libglib2.0-0 libssl-dev libdb-dev
RUN ld -v
ARG GIT_SSL_NO_VERIFY=1
RUN apt-get install -y automake1.11 binutils-dev
COPY bitshred/bitshred_single_steps bitshred_single_steps
RUN cd bitshred_single_steps/ && ./configure && make
COPY bitshred/bitshred_single bitshred_single
RUN cd bitshred_single/ && ./configure && make
COPY bitshred/bitshred_openmp bitshred_openmp
RUN cd bitshred_openmp/ && make
WORKDIR /
docker build . -t nino:bitshred
docker run --rm --volume <input_folder>:/input:ro --volume <output_folder>:/db --entrypoint <entrypoint>/src/bitshred nino:bitshred <options>
- input_folder: folder containing the samples
- output_folder: folder where the results of the tool will be placed in
- entrypoint: version of the tool to run: only supported either
bitshred_single
orbitshred_single_steps
orbitshred_openmp
- options: command line options required by the tool, specified between double quotes. NB: if you have to specify the input folder, you must use
/input
(any other path is not valid) and it corresponds to the value specified ininput_folder
. Same reasoning for the output folder, whose path to use is/db
Example:
docker run --rm --volume "/home/nino/samples/":/input:ro --volume "/home/nino/output":/db --entrypoint /bitshred_single/src/bitshred nino:bitshred "-b" "/input/" "-t" "0.60"