Nicola Bena, Marco Anisetti, Gabriele Gianini, Claudio A. Ardagna.
This repository contains
- dataset we used as starting point to generate evasion attacks (directory
Dataset/test_set.npz
) - model we used as target of our evaluation (file
Model/lstm.h5
) - code we used to generate evasion attacks (file
Code/notebook.ipynb
) - result of the evaluation (directory
Output
)
In a nutshell, we re-executed the training process as indicated in the original publication presenting the malware detector (link), and exported the trained model as .h5
model. Here, we import such a model, choose the first 100 data points in the test set, and craft an evasion attack starting from these points varying epsilon
. Our results show that the model is highly vulnerable to such attack. However, we note that the evasion attack is limited to perturbing extracted features, and it might be more difficult to carry out this attack in the real world.
Experiments have been executed on an Apple MacBook Pro, featuring 10 CPUs Apple M1 Pro, 32 GBs of RAM, OS Sonoma 14.1.2
. The instructions to prepare the environment are thus applicable for this setting only.
First, create a conda environment
using, e.g, miniforge
.
conda create my-env python=3.11
conda activate my-env
We then install the necessary libraries using pip
because there are some incompatibilities between the OS version and what we need to install (as of writing).
pip install \
adversarial-robustness-toolbox \
numpy \
pandas \
scikit-learn \
tensorflow \
tensorflow-metal
The final step is to verify that GPUs are recognized.
import tensorflow as tf
print(tf.__version__)
print(tf.config.list_physical_devices())
Output should be something like:
2.15.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
If GPU
does not appear in the device list, then there are some issues with the installation. Note that it should work even without GPU support.
The libraries version are defined in requirements.txt
to install the libraries, although it is specific for MacOS.
The entire code is implemented as a Python notebook Code/notebook.ipynb
. The notebook includes detailed explanations on the process we followed. In summary, we proceeded as follows.
- We loaded the entire test set used to evaluate the LSTM model (provided in
Dataset/test_set.npz
asnumpy
compressed file). - We chosen the first 100 data points of the test set whose label is
1
(i.e., malware). - We loaded the LSTM model and evaluated its performance on the chosen malware data points, to make sure loading worked properly.
- We carried out an evasion attack using the fast gradient method varying
epsilon
. For each value ofepsilon
we generated 100 data points starting from those chosen at point 2., and retrieved the predicted label. - We finally exported the generated data points and an additional file summarizing the results.
The files we generated during the experiment are saved in the directory Output
.
Each sub-directory refers to data points generated according to a specific value of epsilon
, each file in each sub-directory is a data point created during the evasion attack.
Finally, file Output/adversarial_results.csv
summarizes the retrieved results. In particular, for each value of epsilon
, shows
- the model accuracy
- the count of data points classified as malware
- the count of data points misclassified as benign
- the ratio between the count of data points classified as malware with respect to the total number of crafted data points (i.e., count of data points classified as malware divided by
100
).
Note: these measures are slightly redundant, but we decided to keep them.