Paper: Low-Quality Training Data Only? A Robust Framework for Detecting Encrypted Malicious Network Traffic
cd ./Preprocess
python Feature_Extract.py "input_dir" "sequence_data_path" "ext"
python get_origin_flow_data.py "sequence_data_path" "save_dir" "data_type"
input_dir:
The directory of all raw pcap files of one kind of data, e.g.benign
,malicious
, ortest
.sequence_data_path:
The sequence data of all flows in pcap files, without zero-padding. (The values are prefix-cumulative values, and will be further processed byget_origin_flow_data.py
)ext:
The extension name of pcap files to process (e.g.pcap
,pcapng
).save_dir:
The save directory of the processed sequence data of all flows.data_type:
The kind of the processed data, e.g.w
,b
andtest
.
Output: A sequence numpy file of data_type
in save_dir
. i.e., {save_dir}/{data_type}.npy
, the dimension of each sample is 50. Need to add the 51st dimension for detection.
cd ./main
python main.py
The argument can be modified in main.py
are:
data_dir:
The directory of all sequence data.feat_dir:
The directory of all feature data.made_dir:
The directory of all results calculated byMADE
.model_dir:
The directory of all trained models.result_dir:
The directory of the detection/prediction result of the test data.
The required input files:
{data_dir}/{benign.npy}:
The benign preprocessed training data.{data_dir}/{malicious.npy}:
The malicious preprocessed training data.{data_dir}/{test.npy}:
The preprocessed testing data.
All data in {data_dir}
should have dimensions of (n, 51), where n is the number of samples. Each sample is a 51-dimension vector, where first 50 dimensions are time-series data of traffic and the last one is the true label for the sample (used to evaluate. 0
is for benign and 1
is for malicious). If RAPIER is used to predict, the last dimension can be any value.
Output:
{result_dir}/{prediction.npy}:
The prediction of all testing data.1
is malicious and0
is benign.