Hello!

This repo is to track a few code snippets to build teach code-llama2 how to create malware. I explained the process in article - When AI Becomes the Hacker: The Emerging Threat of Weaponized Language Models

Due to the dangers of what the fine-tuned version can do, I'm not including the full dataset or how exactly I built the dataset. You will find in this repo these key files:

  • create-dataset.py. This is a simplified version of how I was able to build a subset of the training dataset that consist of malware and viruses code explained.
  • fine_tune_codellama.ipynb. This is this is the colab notebook I used to run the experiment on an A100 machine.
  • ddos_dataset.txt. It includes a very small sample of the dataset I used to fine-tune the model.