Optimization-of-code-caves-in-malware-binaries-to-evade-Machine-Learning-detectors

All experiments were run on an Intel Core i5-8250U CPU @ 1.60GHz, with 16 Gb RAM. The Operating System used was Ubuntu 20.04.1 LTS, 64-bit. All implemented methods were coded in Python 3.7. We used the open source tools Radare2 (radare2 4.5.0-git) and Pefile to manipulate the PE binaries.

Installation

Clone the project and install the packages included in requirements.txt:

git clone https://github.com/JavierYuste/Optimization-of-code-caves-in-malware-binaries-to-evade-Machine-Learning-detectors
cd Optimization-of-code-caves-in-malware-binaries-to-evade-Machine-Learning-detectors
pip install -r requirements.txt

Then, download the pretrained MalConv model from https://github.com/elastic/ember and place it in the src folder.

In order to run correctly, Radare2 must be installed on the system.

Data

To ease reproducibility, the SHA256 hash of the samples used in the experimentation is provided.

Citing

Details of this work can be found in the full article. Please cite as:

@article{Yuste2021MLEvasion,
  title = {Optimization of code caves in malware binaries to evade Machine Learning detectors},
  journal = {Computers & Security},
  pages = {102643},
  year = {2022},
  issn = {0167-4048},
  doi = {https://doi.org/10.1016/j.cose.2022.102643},
  url = {https://www.sciencedirect.com/science/article/pii/S0167404822000426},
  author = {Javier Yuste and Eduardo G. Pardo and Juan Tapiador},
  keywords = {Malware, Evasion, Machine Learning, Adversarial Example, Genetic Algorithm}
}