You can download an example dataset for the MalwareAnalysis
project from the following link:
This dataset can be used to test and train the model.
For MacOS:
- Download Python from python.org/downloads.
- Open the
.pkg
file and follow instructions. - Verify in Terminal:
python3 --version
.
For Linux:
- Update packages:
sudo apt-get update
. - Install Python:
sudo apt-get install python3
. - Verify in Terminal:
python3 --version
.
Set VIRUSTOTAL_API_KEY to be your API key from VirusTotal
-
Create Folder Structure: From the root directory, create a folder named
MalwareAnalysis
. Inside it, create two subfolders:Malware
for storing malware opcodes.Benign
for storing benign opcodes.
-
Create and Activate Virtual Environment:
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- Windows:
.\venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
- Create a virtual environment:
-
Install Python Packages:
- Run:
pip install -r requirements.txt
- Run:
-
Train the Model:
- Run:
python train.py
- The vectorizer to transform user input is stored in
count_vectorizer.joblib
. - The model is stored in
rf_opcodes_freq_ngram_2.joblib
.
- Run:
-
Set Up Environment (If not already done):
- Create and activate the virtual environment.
- Install Python packages:
pip install -r requirements.txt
-
Run the Model:
- Execute the script with an executable filename as an argument:
python main.py <exe filename>
- Execute the script with an executable filename as an argument: