Information-based-Heavy-Hitters-for-Real-Time-DNS-Exfiltration-Detection

This repository contains code for the NDSS 2024 paper Information-based-Heavy-Hitters-for-Real-Time-DNS-Exfiltration-Detection

Citing the paper

If you use the code in this repository for your work, please cite the paper:

@article{ozeryinformation,
  title={Information-Based Heavy Hitters for Real-Time DNS Data Exfiltration Detection},
  author={Ozery, Yarin and Nadler, Asaf and Shabtai, Asaf}
}

Repository Description

  • WeightedHyperLogLog - contains implementation of HyperLogLog, adapted for the task of weighted cardinality estimation.
  • data - contains datasets needed to run this code
    • allow_lists
      • global_allow_list.csv - global popularity-based allow-list (You need to provide this).
      • pt.csv - peace time allow list (generated by the code).
    • dataset.csv - The original dataset. Provided by you. Notice that the code assumes the Ziza et al. dataset. The following two columns are expected:
      • timestamp - the timestamp of the DNS request (Unix time, milliseconds).
      • request - the DNS request itself.
  • config.py - contains configuration values. Notice that some are left emtpy and need to be filled before running the code.
  • preprocess_dataset.py - preprocess the dataset (keeps only the columns we are interested at).
  • split_dataset.py - splits the dataset into threshold tuning dataset ("training"), peace-time generation dataset and wartime ("test") dataset.
  • main_tune.py - tunes the detection threshold, based on the acceptable FPR value.
    • The following file is generated by this:
    • detection_threshold.txt - The detection threshold to obtain the appropriate acceptable FPR.
  • main_pt.py - generates PT allow-list.
  • main_wt.py - runs the ibHH in WT (enforcing) mode, includes allow-listing based on global allow-list and the PT allow-list.

Instructions

Notice: these instructions assume you have DNS queries dataset. Code has been tested on Ubuntu Linux and MacOS, with Python 3.10 and 3.11

  1. Install the requirements: pip install requirements.txt
  2. Install the ibHH package: cd WeightedHyperLogLog && pip install . && cd ..
  3. Preprocess the dataset (required for the Ziza et al. dataset, but might be optional for other datasets): python3 split_dataset.py
  4. Split the dataset into tune (threshold tuning), pt generation and wt (test): python3 main_tune.py
  5. Tune the ibHH threshold: python3 main_tune.py
  6. Generate peacetime allow-list: python3 main_pt.py
  7. Run the ibHH in wartime mode: python3 main_wt.py