Code of the paper: "Supervised contrastive learning over prototype-label embeddings for network intrusion detection"
This repository contains the code and dataset files for the paper: "Supervised contrastive learning over prototype-label embeddings for network intrusion detection", Manuel Lopez-Martin, Antonio Sanchez-Esguevillas, Juan Ignacio Arribas and Belen Carro.
The dataset files are available in this location: https://drive.google.com/drive/folders/1QSf-0wK-pTKHamA13xW4KpNxcTTJsqt2?usp=sharing. The content of this location, with the data files, is:
- The original NSL-KDD and UNSW-NB15 (ADFA) intrusion detection datasets: UNSW_NB15_training-set.csv, UNSW_NB15_testing-set.csv and KDDTest+.txt, KDDTrain+.txt; which are available from: https://www.unb.ca/cic/datasets/nsl.html and https://researchdata.edu.au/unsw-nb15-dataset
- The datasets after data processing: NSL_KDD_Load.pkl and ADFA_Load.pkl. These are the files that are used in the paper's code.
All code files are available in this (github) repository. The description of the different files is as follows:
- NSL_KDD_Load v2.0.ipynb : Code to perform the data processing of the NSL-KDD dataset
- ADFA_Load 1.0.ipynb : Code to perform the data processing of the UNSW-NB15 dataset
- IDS_NSL-KDD v2.0-5labels.ipynb: Code for the most representative models proposed in the paper, applied to the NSL-KDD dataset (5 labels)
- IDS_UNSW-NB15 v2.0-10labels.ipynb: Code for the most representative models proposed in the paper, applied to the UNSW-NB15 dataset (10 labels) and trying to avoid, as far as possible, redundancy with the models already presented for the NSL-KDD dataset.
- Lib.py: Code file with some auxiliary functions for data processing
The different models presented in each code file are:
- NSL-KDD (IDS_NSL-KDD v2.0-5labels.ipynb)
- LBL, ConCE (with embedding dimensions 2 and 10)
- LBL, ConLE
- RLB-CL, E2NMS
- RLB-CL, E2NAMS
- RLB-CL, E2WNAMS (1,0.5,0.5)
- RLB-CL, ENMS
- RLB-CL, CE+ENMS
- RLB-CL, NMM
- RLB-CL, CE+NMM
- RLB-CL, CEDist+NMM
- RLB-CL, WNAMM (1,0.5,1,0.5)
- RLB-CL, NAMM
- RLB-CL, AMM
- RLB-CL, ConN
- UNSW-NB15 (IDS_UNSW-NB15 v2.0-10labels.ipynb)
- LBL, ConCE
- LBL, ConLE
- RLB-CL, E2NMS
- RLB-CL, ENMS
- RLB-CL, CE+ENMS
- RLB-CL, NMM
- RLB-CL, NAMM
- RLB-CL, MMoLE
The models included in each code file correspond to a representative selection, trying to avoid redundancy. Each model includes: the architecture of the model, the results of the training phase, the performance metrics for the test set and the visualization of the clusters.
The chosen models correspond mainly to models with an embedding dimension of two (to facilitate the visualization of the clusters) and to the multiclass case with 5 and 10 labels (because they are more complex and representative).
The code can be executed in the cloud using Google Colaboratory https://colab.research.google.com/notebooks/intro.ipynb