Malware-Research

Malware research with machine learning under guidance of Professor Mark Stamp at SJSU. Results are published in the Springer Textbook "Malware Analysis Using Artificial Intelligence and Deep Learning" (https://link.springer.com/book/10.1007/978-3-030-62582-5) and the arXiv paper "On Ensemble Learning" (https://arxiv.org/abs/2103.12521).

Goal: Use ensemble learning and various models to classify malware into their respective families

Process:

Extract all file names to classify and group them into their families
Use Radare2 to disassemble each file and write the opcode sequence onto text files
Create a large .csv file with all the opcode data
- in the .csv file, we use the first 1000 opcodes as features for training -remove any malware samples that do not have 1k opcoes or are corrupted
models:
- classic:
  - random forest
  - adaboost
  - xgboost
  - svm
  - bagged svm
  - hmm
  - bagged hmm
  - boosted hmm
  - knn
  - mlp
  - voting
- deep learning:
  - cnn
  - bagged cnn
  - boosted cnn
  - lstm
  - bagged lstm
  - boosted lstm
- voting:
  - all bagged and boosted cnns
  - all bagged and boosted lstms
  - all bagged cnns and bagged lstms
  - all boosted cnns and boosted lstms
  - all bagged and boosted cnns and lstms
  - all deep learning and classic models combined Results: -https://drive.google.com/drive/u/1/folders/1vliGOjaUDsqGVy_sq191jorfYquIj7JP

allenye66/Malware-and-Ensemble-Learning-Research