By Victor Le Pochat, Tim Van hamme, Sourena Maroofi, Tom Van Goethem, Davy Preuveneers, Andrzej Duda, Wouter Joosen, and Maciej Korczyński
This repository contains the source code and models of our NDSS 2020 paper A Practical Approach for Taking Down Avalanche Botnets Under Real-World Constraints.
feature_generation
contains the code for parsing raw input data, extracting feature values and ground truth, and exporting them to input files for the machine learning classifier.evaluation_code_and_models
contains the code for the evaluation and the models that were trained during it. The evaluation procedure that is followed can be found in thepaper.sh
bash script, it is as follows:- train the models within 1 year by using
production_train.py
, do this for all dataset combinations and both the 2017 and 2018 iterations - evaluate the performance of every iteration and every dataset combination by using
experiment.py
, this also finds the thresholds for the work reduced metric - do the above evaluation for the full ensemble by calling
ensemble_evaluation.py
- evaluate ensemble performance when trained on one iteration and tested on another by calling
incremental_learning_evaluation.py
- evaluate the extended model trained on 2017 data + a part of 2018 data by calling
incremental_learning_evaluation.py
- the dataset impact evaluation for both the extended and base models are found in
dataset_impact_evaluation_extended.py
anddataset_impact_evaluation.py
- train the models within 1 year by using
- The evaluation code depends on scikitk-learn for training the models. To obtain the equal error rate evaluation metric we rely on bob suite. Other used packages: numpy, pandas.
Due to the sensitivity of the ground truth provided by law enforcement and commercial agreements for the third-party data sets, we cannot share the raw input data.