Botspot++: A Hierarchical Deep Ensemble Model for Bots Install Fraud Detection in Mobile Advertising (https://doi.org/10.1145/3476107)
To evaluate our proposed model more comprehensively, we built three datasets for different time periods, which are avaliable from https://drive.google.com/drive/folders/1CBIOxCtI5Ztx-E5Ua7nO0UjdEabJM2nC?usp=sharing. And The statistics data of the four offline datasets are detailed as below.
Dataset | #Dev | #Chan-Camp | #Normal Install(Train, Test) | #Bots Install(Train, Test) |
---|---|---|---|---|
dataset-1 | 1676101 | 1347 | 1245650, 162960 | 270815, 20560 |
dataset-2 | 1313073 | 1190 | 1049610, 195792 | 139349, 9596 |
dataset-3 | 1299895 | 1139 | 1153705, 181437 | 77708, 12016 |
- Pytorch 1.6.0
- LightGBM 3.0.0
- Python 3.6
- scikit-learn 0.23.2
- Numpy 1.19.1
git clone https://github.com/mobvistaresearch/BotSpot-Plus.git
cd BotSpot-Plus
- Download datasets
download them from this link(https://drive.google.com/drive/folders/1CBIOxCtI5Ztx-E5Ua7nO0UjdEabJM2nC?usp=sharing) and put the datasets folder on root folder of current project. - Model training
LightGBM:
cd ML
# set which dataset is used for training and the parameters of LightGBM
python main.py --dataset dataset1 --num_trees 500 --max_depth 5
MLP:
cd DL/MLP
# set which dataset to use for training
python main.py --dataset dataset1
GAT:
cd DL/GAT
# set which dataset to use for training
python main.py --dataset dataset1 --device_num 0
GraphConsis:
cd DL/GraphConsis
# set which dataset and which gpu device to use for training
python main.py --dataset dataset1 --device_num 0
GraphSAGE、BotSpot、BotSpot++:
cd DL/BotSpot
--dataset: the dataset specified, e.g., dataset1, dataset2, etc.
--use_gbm: whether to use gbm model for global context. e.g., take True or False.
--use_stratified: whether to use stratified during message passing, take True or False.
--use_botspot_plus: whether to use botspot_plus
--use_self_attn: whether to use self attention for leaf embeddings
--device_num: set which gpu device to use for training
GraphSAGE usage:
python main.py --dataset dataset1 --use_gbm false --use_stratified false
--use_botspot_plus false --use_self_attn false --device_num 0
BotSpot usage:
python main.py --dataset dataset1 --use_gbm true --use_stratified true
--use_botspot_plus false --use_self_attn false --device_num 1
BotSpot++ usage:
python main.py --dataset dataset1 --use_gbm true --use_stratified true
--use_botspot_plus true --use_self_attn true --device_num 2