代码来自论文 "Website Fingerprinting in the Age of QUIC" (PETS 2021).
代码被分成workflows,一个workflows负责一个或几个相关实验, 每个workflows都包含用于收集和处理数据、执行机器学习分类以及为论文生成图表的脚本。
The data used to support this paper are provided in two sets:
- quic-wf-core.tgz (831 MB):
- 用于扫描的域名和以CSV格式提供的带有headers的扫描结果;
- 具有class、protocol和 VPN 位置标签的 HDF5 格式的数据集; size和timestamp数组;低于175字节的数据包被移除;
- quic-wf-raw.tar (28 GB):
- 原始获取的QUIC和TCP跟踪及其关联的元数据.
- 每个文件都是一个JSON对象流,包含以下可能为空值的键:
- url, final_url: requested and final redirected URLs
- status: HTTP status code of the fetch
- protocol: protocol used to request the main page, "quic" or "tcp"
- packets: base64 encoded PCAP for the request
- http_trace: Chromium DevTools performance log (reference)
- 由于大小限制,此内容仅根据请求提供。.
- New 2022-01-24: Also available in pcapml format upon request!
Note: 数据集中的QUIC trace是指一种包含了QUIC和TCP数据包的Wireguard隧道的跟踪,这些数据包与通过QUIC连接请求网页相关。这意味着至少与Web服务器建立的初始连接是通过QUIC完成的
- Bash
- Git and Git-LFS 2.17
- Python 3.7
- Dependencies listed in
requirements.txt
- The wf-tools library
- Dependencies listed in
- Optional:
- Docker 19.03.12
- Wireguard v1.0.20200513
- docker-machine v0.16.2
该代码最后在一个计算集群上运行,每个实验在2-4个核心上运行(每个核心2.4 GHz),每个核心配备6 GB主内存。 机器学习的训练和测试使用了额外的0-2个基于分类器的GPU。 为了减少运行时间,这些作业是并行运行的,每个训练-测试拆分需要1-3小时。
Note: 当前的 requirements.txt 文件指定了使用 tensorflow-cpu。如果你有 GPU 可以使用的话,请安装 tensorflow-gpu 替代.
一个包含已下载代码和数据的虚拟机镜像当前可通过此 link快速访问.
下说明描述了如何从头开始设置和运行workflows.
# Clone the repository
git clone https://github.com/jpcsmith/wf-in-the-age-of-quic.git
# Change to the code directory
cd wf-in-the-age-of-quic/
# Download the git LFS files
git lfs pull
python3 -m venv env
source env/bin/activate
# Ensure that pip is the latest version
python3 -m pip install --upgrade pip
# Install the requirements using pip
python3 -m pip install --no-cache-dir -r requirements.txt
If the installation fails, ensure that the Python development libraries are installed and retry the above.
On Ubuntu 18.04, this would be the python3.7-dev
and python3-venv
packages.
wget https://polybox.ethz.ch/index.php/s/u10mAN6NCcDP39U/download -O quic-wf-core.tgz
tar -xzvf quic-wf-core.tgz
If planning to run trace-collection, i.e. from the Fetch QUIC Traces workflow, install docker (19.03.12) and Wireguard (v1.0.20200513).
Change to the desired workflow's directory and follow the instructions for running the workflow.
Paper Section | Workflows | Directories |
---|---|---|
4. Combined QUIC-TCP Dataset | Identify QUIC Sites Fetch QUIC Traces |
workflows/identify-quic-sites workflows/fetch-any-quic |
6. From TCP to QUIC | Generalisability Analysis Single and Mixed Analyses |
workflows/generalisability-analysis workflows/single-and-mixed-analyses |
7. Joint Classification of QUIC and TCP | Single and Mixed Analyses Distinguish Protocol |
workflows/single-and-mixed-analyses workflows/distinguish-protocol |
8. Remove Control Packets | Removing Control Packets | workflows/removing-control-packets |
下表列出了负责论文中各个表格和图形的程序和文件。 Notebooks位于notebooks/目录中,输出文件位于results/plots目录中 relative to the associated workflow.
The code and associated data is released under an MIT licence as found in the LICENCE file.