Malware Detection and Machine Learning Repository

Welcome to the Malware Detection and Machine Learning Repository! This repository contains two interconnected projects that are built progressively to enhance our understanding of malware detection and machine learning.

Project Progression

The projects within this repository are designed to build upon one another, offering a seamless progression in the realm of malware detection and machine learning. I recommend following this progression to fully grasp the concepts and techniques involved:

1. Cross_Validation_Model_Selection

In the "Cross_Validation_Model_Selection" project, I delve into model selection and evaluation using cross-validation techniques to classify malware and benignware samples. Here's an overview of the project:

  • Project Overview: This project involves several tasks aimed at selecting the best model for classification and evaluating its performance. It includes dataset splitting, feature selection using cross-validation, model selection using different classifiers, and live classification and performance evaluation.

  • Included Datasets:

    • dataset.csv: The original dataset.
    • reduceddataset.csv: Used for model training.
    • samples.csv: A subset used for testing.
  • Key Tasks:

    1. Splitting the dataset into train and test datasets.
    2. Feature selection using cross-validation.
    3. Model selection using cross-validation.
    4. Live classification and performance evaluation.
  • Repository Structure: The repository includes Python scripts for each task, facilitating easy execution.

To get started, explore the tasks within this project, and run the Python scripts individually for each task.

Hyperparameter_Tuning

In the "Hyperparameter_Tuning" project, I focus on optimizing hyperparameters for a Support Vector Classifier (SVC) using cross-validation. This project builds on the dataset used in the "Cross_Validation_Model_Selection" project. Here's an overview:

  • Project Overview: The project aims to fine-tune the hyperparameters of an SVC to achieve the best possible performance. It involves feature selection, randomized hyperparameter tuning, grid hyperparameter tuning, model training, and model evaluation.

  • Included Datasets: This project uses the same datasets as the "Cross_Validation_Model_Selection" project.

  • Key Tasks:

    1. Feature selection and model evaluation.
    2. Randomized hyperparameter tuning.
    3. Grid hyperparameter tuning.
    4. Train and save the model.
    5. Model evaluation and generation of plots.
  • Repository Structure: Similar to the first project, this project's tasks are organized into separate Python scripts.

To utilize this project, execute the Python scripts corresponding to each task.

Running the Code

To run the code for each project, ensure you have the required Python packages installed (matplotlib, numpy, pandas, and sklearn). You can execute the Python scripts individually for each task within the projects.

Feel free to explore each project's specific README files for more detailed information on each task and how to execute them.