This is the repository for training the trace generator. We mainly introduce the usage of the code as below. This is only for research purpose, so use carefully.
First modify conf.ini
(only MONITORED_SITE_NUM
and MONITORED_INST_NUM
matters).
The raw traces must be in the cell sequence format or packet sequence format where each file has two columns:
the first column lists the timestamps and the second column lists the direction (+-1) or the directional bytes (+-bytes).
We extract features in the burst sequence format, that is, a sequence of +-N representing the size of a burst (N is the number of cells).
The sign shows the packet direction.
Usage:
python3 src/extract.py --dir [your_dir] --length [your_preferred_length] --format [file_suffix]
You will get two .npz files: one saves the burst sequence features together with the labels; another saves the time features used for modeling o2o time gaps.
Our GAN involves an observer which is a pre-trained DF model. Using the features generated above, we can train a DF model based on the burst sequences by
python3 src/train_df.py [your_feature_dir]
The model will be saved to ./f_model/xxx.ckpt
.
Here is an example of training the GAN:
python3 src/mlp_df_wgan_train.py -d ./dump/my_dataset/feature/raw_feature_0-100x0-1000_clip.npz --f_model ./f_model/df_raw_feature_0-100x0-1000.ckpt
--f_model
is the pre-trained observer and -d
provide the dir of training set.
Please also take a look at other arguments in the code.
We need to model the o2o time gap for the defense based on the dataset.
Remember we have generated a time_feature_xxx.npz
file during the extraction phase.
We now use the time_feature to generate the o2o distribution with KDE method:
python3 src/ipt_sampler.py --tdir [your_time_feature_path] -n 1000000
You will get a xxx.ipt
file.
The first row shows the computed kde kernel_std and the rest 1,000,000 values are the time gaps sampled from the original dataset (seconds, in log scale).
This file is enough to model the hidden distribution of o2o time gap.
To sample a time gap from the distribution is equivalent to compute
t + normal(0,1) * kernel_std,
where t is randomly sampled from the 1,000,000 time gaps. Remember to convert back to second from the log values.