/Website-Fingerprinting-Library

A Library for Advanced DL-based Website Fingerprinting Attacks.

Primary LanguagePythonMIT LicenseMIT

Website-Fingerprinting-Library (WFlib)



WFlib is a Pytorch-based open-source library for website fingerprinting attacks, intended for research purposes only.

We provide a neat code base to evaluate 11 advanced DL-based WF attacks on multiple datasets. This library is derived from our ACM CCS 2024 paper. If you find this repo useful, please cite our paper.

@inproceedings{deng2024wflib,
  title={Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis},
  author={Deng, Xinhao and Li, Qi and Xu, Ke},
  booktitle={Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security},
  year={2024}
}

Contributions via pull requests are welcome and appreciated.

WFlib Overview

The code library includes 11 DL-based website fingerprinting attacks.

Attacks Conference Paper Code
AWF NDSS 2018 Automated Website Fingerprinting through Deep Learning DLWF
DF CCS 2018 Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning df
Tik-Tok PETS 2019 Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks Tik_Tok
Var-CNN PETS 2019 Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning Var-CNN
TF CCS 2019 Triplet Fingerprinting: More Practical and Portable Website Fingerprinting with N-shot Learning tf
BAPM ACSAC 2021 BAPM: Block Attention Profiling Model for Multi-tab Website Fingerprinting Attacks on Tor None
ARES S&P 2023 Robust Multi-tab Website Fingerprinting Attacks in the Wild Multitab-WF-Datasets
RF Security 2023 Subverting Website Fingerprinting Defenses with Robust Traffic Representation RF
NetCLR CCS 2023 Realistic Website Fingerprinting By Augmenting Network Trace Realistic-Website-Fingerprinting-By-Augmenting-Network-Traces
TMWF CCS 2023 Transformer-based Model for Multi-tab Website Fingerprinting Attack TMWF
Holmes CCS 2024 Robust and Reliable Early-Stage Website Fingerprinting Attacks via Spatial-Temporal Distribution Analysis WFlib

We implemented all attacks using the same framework (Pytorch) and a consistent coding style, enabling researchers to evaluate and compare existing attacks easily.

Usage

Install

git clone git@github.com:Xinhao-Deng/Website-Fingerprinting-Library.git
pip install --user .

Note

  • Python 3.8 is required.

Datasets

mkdir datasets
  • Download datasets (link) and place it in the folder ./datasets
Datasets # of monitored websites # of instances Intro
CW.npz 95 105730 Closed-world dataset. Details
OW.npz 95 146446 Open-world dataset. Details
WTF-PAD.npz 95 105730 Dataset with WTF-PAD defense. Details
Front.npz 95 95000 Dataset with Front defense. Details
Walkie-Talkie.npz 100 90000 Dataset with Walkie-Talkie defense. Details
TrafficSliver.npz 95 95000 Dataset with TrafficSliver defense. Details
NCDrift_sup.npz 93 21430 Network condition drift dataset, including superior traces. Details
NCDrift_inf.npz 93 6882 Network condition drift dataset, including inferior traces. Details
Closed_2tab.npz 100 58000 2-tab dataset in the closed-world scenario. Details
Closed_3tab.npz 100 58000 3-tab dataset in the closed-world scenario. Details
Closed_4tab.npz 100 58000 4-tab dataset in the closed-world scenario. Details
Closed_5tab.npz 100 58000 5-tab dataset in the closed-world scenario. Details
Open_2tab.npz 100 64000 2-tab dataset in the open-world scenario. Details
Open_3tab.npz 100 64000 3-tab dataset in the open-world scenario. Details
Open_4tab.npz 100 64000 4-tab dataset in the open-world scenario. Details
Open_5tab.npz 100 64000 5-tab dataset in the open-world scenario. Details
  • The extracted dataset is in npz format and contains two values: X and y. X represents the cell sequence, with values being the direction (e.g., 1 or -1) multiplied by the timestamp. y corresponds to the labels. Note that the input of some datasets consists only of direction sequences.

  • Divide the dataset into training, validation, and test sets.

# For single-tab datasets
python exp/dataset_process/dataset_split.py --dataset CW
# For multi-tab datasets
python exp/dataset_process/dataset_split.py --dataset Closed_2tab --use_stratify False

Training & Evaluation

We provide all experiment scripts for WF attacks in the folder ./scripts/. For example, you can reproduce the DF attack on the CW dataset by executing the following command.

bash scripts/DF.sh

The ./scripts/DF.sh file contains the commands for model training and evaluation.

dataset=CW

python -u exp/train.py \
  --dataset ${dataset} \
  --model DF \
  --device cuda:1 \
  --feature DIR \
  --seq_len 5000 \
  --train_epochs 30 \
  --batch_size 128 \
  --learning_rate 2e-3 \
  --optimizer Adamax \
  --eval_metrics Accuracy Precision Recall F1-score \
  --save_metric F1-score \
  --save_name max_f1

python -u exp/test.py \
  --dataset ${dataset} \
  --model DF \
  --device cuda:1 \
  --feature DIR \
  --seq_len 5000 \
  --batch_size 256 \
  --eval_metrics Accuracy Precision Recall F1-score \
  --load_name max_f1

The meanings of all parameters can be found in the exp/train.py and exp/test.py files. WFlib supports modifying parameters to easily implement different attacks. Moreover, you can use WFlib to implement combinations of different attacks or perform ablation analysis.

Contact

If you have any questions or suggestions, feel free to contact:

Acknowledgements

We would like to thank all the authors of the referenced papers. Special thanks to Yixiang Zhang for his support.