This repository contains code for the NDSS 2024 paper Information-based-Heavy-Hitters-for-Real-Time-DNS-Exfiltration-Detection
If you use the code in this repository for your work, please cite the paper:
@article{ozeryinformation,
title={Information-Based Heavy Hitters for Real-Time DNS Data Exfiltration Detection},
author={Ozery, Yarin and Nadler, Asaf and Shabtai, Asaf}
}
WeightedHyperLogLog
- contains implementation of HyperLogLog, adapted for the task of weighted cardinality estimation.data
- contains datasets needed to run this codeallow_lists
global_allow_list.csv
- global popularity-based allow-list (You need to provide this).pt.csv
- peace time allow list (generated by the code).
dataset.csv
- The original dataset. Provided by you. Notice that the code assumes the Ziza et al. dataset. The following two columns are expected:timestamp
- the timestamp of the DNS request (Unix time, milliseconds).request
- the DNS request itself.
config.py
- contains configuration values. Notice that some are left emtpy and need to be filled before running the code.
preprocess_dataset.py
- preprocess the dataset (keeps only the columns we are interested at).split_dataset.py
- splits the dataset into threshold tuning dataset ("training"), peace-time generation dataset and wartime ("test") dataset.main_tune.py
- tunes the detection threshold, based on the acceptable FPR value.- The following file is generated by this:
detection_threshold.txt
- The detection threshold to obtain the appropriate acceptable FPR.
main_pt.py
- generates PT allow-list.main_wt.py
- runs the ibHH in WT (enforcing) mode, includes allow-listing based on global allow-list and the PT allow-list.
Notice: these instructions assume you have DNS queries dataset. Code has been tested on Ubuntu Linux and MacOS, with Python 3.10 and 3.11
- Install the requirements:
pip install requirements.txt
- Install the ibHH package:
cd WeightedHyperLogLog && pip install . && cd ..
- Preprocess the dataset (required for the Ziza et al.
dataset, but might be optional for other datasets):
python3 split_dataset.py
- Split the dataset into tune (threshold tuning), pt generation and wt (test):
python3 main_tune.py
- Tune the ibHH threshold:
python3 main_tune.py
- Generate peacetime allow-list:
python3 main_pt.py
- Run the ibHH in wartime mode:
python3 main_wt.py