Laserbeak: Evolving Website Fingerprinting Attacks with Attention and Multi-Channel Feature Representation
An improved website fingerprinting classification model that achieves state-of-the-art against defended network traffic
LASERBEAK is a state-of-the-art website fingerprinting (WF) attack model that achieves significant performance improvements against defended Tor traffic. By integrating multi-channel traffic representations and advanced techniques adapted from computer vision models, LASERBEAK pushes the boundaries of WF attack performance. Key innovations include:
- Multi-Channel Feature Representation: Provides a richer context for the model by leveraging various traffic features, improving robustness against defenses.
- Advanced Transformer-Based Architecture: Utilizes multi-headed attention layers to capture both local and global patterns in the traffic data.
In evaluations, LASERBEAK outperforms existing WF attacks, demonstrating up to 36.2% performance improvements in defended scenarios.
- Multi-Channel Traffic Representations: Combining multiple traffic features leads to significant accuracy improvements against defended samples.
- Novel WF Model: Leveraging state-of-the-art transformer-based techniques to enhance WF attack performance.
- Precision Optimizers: Introducing new methods for improving precision in open-world scenarios, critical for realistic attacks.
- Comprehensive Dataset Evaluation: Extensive testing on both closed-world and open-world datasets to validate the model's efficacy.
To train a LASERBEAK model on a specified dataset, use the following command:
python benchmark.py \
--data_dir /path/to/data \
--ckpt_dir /path/to/checkpoints \
--results_dir /path/to/results \
--config ./configs/laserbeak.json \
--dataset be-front \
--epochs 20 \
--multisamples 10 \
--exp_name my_experiment
This command trains the model on the be-front
dataset using x10 simulated samples for 20 epochs.
To evaluate a pre-trained LASERBEAK model, use:
python benchmark.py \
--data_dir /path/to/data \
--ckpt /path/to/model_checkpoint.pth \
--eval --dataset be-front
This command loads the model checkpoint and evaluates its performance on the be-front
dataset.
The benchmark.py
script supports various datasets for training and evaluation, including:
be
: Basic datasetbe-front
: Front-defended datasetamazon
: Amazon traffic datasetwebmd
: WebMD traffic datasetgong
: Gong dataset with Surakav defenses
Refer to the --dataset
option in the command-line arguments for the full list of available datasets.
We provide these datasets as pre-packaged pickle files that can be downloaded from the this Google drive directory.
To load the provided files using the benchmark.py
script, the --data_dir
argument should be used to direct to the parent directory containing the wf-*
subdirectories. The file and subdirectory names must be the same as provided in the Google drive to function correctly.
This code is research-grade and may contain bugs or other issues. Users are advised to carefully validate results and use the code at their own risk.