Workshop Materials Guide

Please help us improve by taking this survey after the workshop!

This workshop material guide is in three sections:

  1. Overview of the Workshop format

  2. Pre-requisites and Prep Instructions for participating in hands-on exercises of the workshop.

    IMPORTANT: This is entirely optional

    You can still attend the workshop and benefit greatly with the option to complete those exercises on your own. However, many participants prefer to "follow along" during the workshop.

  3. Resources and References we use in the workshop, organized by section

Overview:

This workshop is intended to help the cybersecurity professional go from zero to hero in Machine Learning, Deep Learning, Artificial Intelligence and Large-Language Model (LLMs), with the goal to be able to apply those computing models to solve cybersecurity challenges.

This workshop is perfect for absolute beginners to AI/ML or those more advanced professionals looking for hands on instruction on how to choose use cases, models and begin training and implementation. If you wanted to upskill your cybersecurity career with practical, actionable skills in machine learning and AI, this is the workshop for you. This workshop does not require you to have advanced coding or mathematics skills. (see the Pre-requisites and Prep Instructions for more detail)

Whether you're a curious beginner or a seasoned professional, this hands-on, accessible workshop will equip you with the knowledge and skills to leverage AI in your cybersecurity arsenal. No prior AI experience? No worries! We'll take you from zero to hero, ensuring everyone walks away with valuable, applicable insights. We will cover: "I don't know what AI, Deep Learning and Machine Learning are and at this point I'm too afraid to ask" - an introduction to the differences between these computing types and traditional expert systems Machine Learning Fundamentals for Cybersecurity Professionals (including a working ML model written in Python you can take away from this course) Large Language Models (LLMs) in Cybersecurity - Learn how to train and fine-tune LLMs for specific cybersecurity tasks and identify optimal use cases for LLMs, understanding the limitations and potential pitfalls Navigating the AI Minefield: Pitfalls and Best Practices - choosing the right model or type of AI for the right problem can be challenging, gain knowledge on understanding the strengths and weakness of different AI and ML applications and their cybersecurity applications. You might be surprised at the things AI is actually really bad at! How to take what you learned here and explore more, recommendations on where to go next to grow your knowledge

Agenda

Part 1: AI ML DeepLearning and GenAI - Understanding computing models and what makes AI "artificial intelligence", when to use different computing models based on your use case

Part 2: AI/ML Applications for Cybersecurity - cybersecurity use cases, how to find models and data sets to start experimenting and applying models to your use cases.

Part 3: GenAI Applications for Cybersecurity - GenAI and LLM use cases, the good, the bad and the surprising; RAG and AI Agents

Goals

  • You will understand the difference between machine learning, deep learning, artificial intelligence, generative artificial intelligence
  • Understand what elements to consider about your use cases to determine the right approach
  • Review some examples of good and bad use cases
  • Know where to find pre-trained ML and AI models, data sets and how to use them in your org
  • How to use GenAI / LLMs
  • Learn about the capabilities of RAG and AI Agents

In This Workshop You Will NOT:

•Learn how to build a model from scratch (see SANS SEC595 for this type of instruction)

•Do any calculus or upper-level math

Prerequisites and Workshop Prep Instructions

Prerequisites for this workshop are in two sections: the knowledge that will help you make the most of the workshop and the technical preparation for participating in the hands-on-section of the workshop.

For the technical preparation In this workshop you have the choice to either Download Anaconda and Jupyter Lab or you can follow along using the Cloud Hosted Options for using the notebooks and models.

We do cover HuggingFace in this workshop. Realize that you with need either to have completed the Download preparation steps or have a Google Cloud or Amazing Cloud account to deploy the model to in that step if you wish to follow along. These accounts are involved setups, and it is highly recommended to do this in advance if you wish to follow along.

While the instructors recommend that you do the Download method, as it will allow you to build familiarity with tools you will use through your Machine Learning and AI journey and allow you to use data without putting that data in a cloud service, we provide instructions for both here. The Download method also allows you to easily take advantage of many of the github projects and Juypter Notebook referenced as part of this workshop.

Even if you do the Cloud Hosted option, you will still have pre-class preparation such as creating accounts on those platforms.

-------- Knowledge --------

This workshop does not require you to have advanced coding or mathematics skills. There is no requirement to attend other than a basic understanding of cybersecurity concepts.

It is helpful if you have a basic understanding of Python and descriptive statics (mean, median, mode, average, and standard deviations) and the concept of regression (a form of inferential statics) but it is not required for you to benefit from the workshop. These skills WILL optimize your understanding of the hands on portion of the workshop.

My recommendations for a quick refresh on these topics if you need them:

CodeAcademy's Free Python3 Course - I am a fan of CodeAcademy generally, I find the subscription worth it. You will find many AI / ML concepts in this platform from BeautifulSoup (important for web scraping to acquire data), to NumPy and Matploitlib courses.

AI Python for Beginners - DeepLearning.AI - this instructor is great and teaches a number of machine learning concepts, the site itself is a wonderful resource for the beginner!

CodeCombat - This is my absolute favorite way to learn coding, and they recently added some AI Learning levels as well! This is a game, that is like a RPG game that is incredibly fun and practical way to learn Python, C++ or basic AI skills.

YouTube - Descriptive Statistics - mean, median, mode and standard deviations. This is helpful to understand because these concepts are often how we explore data and try to understand "normal" in order to better identify data anomalies which is very important to cybersecurity use cases

YouTube - Linear and Logistic Regression - these are the most implemented use cases in machine learning for cybersecurity applications, because they are based on a use case where you want to predict or classify.

Bonus - Introduction to Data Science from Anaconda

-------- Technical Prep --------

Preferably, participants will have the following prerequisites to make the most out of the class - but this is not required. This just allows people in class to follow along with practical, hands-on work:

  • Admin rights to install software

  • Device with at least 16 GB RAM

  • At least 200GB free space

  • NVIDIA RTX series (for optimal performance) - at least 4 GB VRAM

  • Python 3.7 or higher installed

  • ChatGPT Plus subscription ($20) and/or Claude.ai Pro or Team Plan (we will cover both of these in class)

  • Ability to reach other AI sites such as perplexity, gamma.app, HuggingFace, Keras,io and Google Colab

    To prepare yourself for the hands-on portion of the class, you must choose between the Download Option or the Cloud Hosted Option instructions below. Instructor generally recommends the download option, because this will best allow you to gain skills you will need to using these models with non-public data sets. However, the Cloud Hosted Option is a much faster setup.

Download Option

  1. First download this folder from this github and save to your desktop.

  2. Sign up for a Kaggle Account. https://www.kaggle.com/ and download the nslkdd data set for the Network Instrusion (Anomaly Detection) exercise from NSL-KDD and make sure it is in the same saved in the same workshop folder. (This becomes important when you open Juypter Lab)

  3. Download and install Anaconda https://www.anaconda.com/download

If you choose to download Anaconda follow these steps for install setup on a Windows Machine:

  • Hit next to agree to install

  • Accept license agreement click next

  • under "Select Installation Type" screen select "Just for me" click next

  • under "choose install location" the default is fine, click next

  • Make sure the option is NOT SELECTED that says "Add Anaconda to my PATH Environment Variable"

  • Register Anaconda3 as your default Python and then install

  • Uncheck the box for Anaconda Edition Tutorial and Getting Started with Anaconda (unless you want to) then finish

  • Go to your computer's "Start/Search" menu and look for 'anaconda prompt'

  • Click the prompt to start the program. The following instructions ALL take place within the Anaconda Prompt window.

    1. At the prompt type:

      conda create --name=workshop python=3.10

      (note that the dash before name is two - together with no space)

    2. Hit enter, then type:

      conda activate workshop

    3. Once it finishes, we will install Jupyter Lab. Start this by typing:

      pip install jupyterlab

      (note that this is one word but when you later call the program to start, it is two words!)

    4. Once that finishes, use the CD or Change Directory command within the to navigate to the workshop folder you downloaded onto your desktop. This will make sure you have access to the files easily when you start the lab

    5. Once you are in the folder, at the prompt type:

      jupyter lab

    6. That should open in your browser, and you should see the Juypter notebooks in your menu. Click on the Dogs vs Cats Keras CNN image classifier. Put your mouse into the first cell of code that is importing libraries. Press the triangle "Play" button at the top. Once this cell completes and the * inside the brackets turns into a number, you are verified successfully prepared.

Cloud Hosted Options

First download this folder from this github and save to your desktop.

Three options for cloud hosting, which we recommend you do ALL of these to make sure you can use all of them with the workshop materials provided below.

However, there is only two we will go through together in the workshop - Kaggle and HuggingFace. The Ananconda Cloud and Google Colab are also options for using you could use to host the same notebooks and data and will allow you to follow along.

To follow along in the workshop Go to kaggle.com and sign up for an account / sign in with a google account or similar identity provider. Once you have completed that, go to this link network_intrusion_detection and hit the "Copy & Edit" button in the upper right of the screen and make sure the environment opens. If you are able to click the Play button that appears once your cursor is in the code box under "Data Cleaning" then you are successfully prepared to use this resource. Note, that this is a large data set you are importing, it will "Spin" a minute before you see an output in a table below it.

If you wish to follow along with the HuggingFace instruction have a Google Cloud or Amazing Cloud account to deploy the model to in that step if you wish to follow along. Go to Huggingface.co and sign up for an account. You will know you have met the prerequisites for this if you can go to ehsanaghaei/SecureBERT · Hugging Face and click "Deploy" and select your platform.

Other options which will also work with Jupyter Notebook files (primarily used in this workshop): For Anaconda Cloud you go to https://nb.anaconda.cloud/ and create an account. Then go to the green circle on the left hand side “Anaconda Toolbox” and Create a New Project. This will allow you to select the files from the workshop folder from your desktop, and leave the environment location as default.

Quick link Anaconda Cloud

Google Colab - Use a google account and make sure you can access https://colab.research.google.com/ you can test this is working properly by visiting Image classification from scratch and clicking the "View in Colab" just under the model description details. If you are able to click the Play button that appears once your cursor is in the code box under "Setup" then you are successfully prepared to use this resource.

Resources and References

Network Intrusion Model Used in the Workshop:

Intrusion Detection System with ML&DL - Example we use in class - Cloud Hosted Option

Intrusion Detection System NSL-KDD] is another - Cloud Hosted Option

intrusion-detection-system-with-ml-dl.ipynb - Example we use in class - Download Option - Available in Workshop folder sourced from from

Other learning resources:

PacktPublishing/Hands-On-Artificial-Intelligence-for-Cybersecurity: Hands-On Artificial Intelligence for Cybersecurity, publised by Packt

Network Traffic Anomaly Detection with Machine Learning

Papers With Code

Claude Model Context Protocol

Sort of API, sort of Agentish Getting Started Using Model Context

Example Prompt for Cybersecurity:

We are building a security tool today. It will support accidental data discovery program, by indentifying instances of accidentally exposed data in likely places.

The system will be modular, and each "likely place" will be a plugin. For start, we will want plugins for elasticsearch, monogdb, and ftp.

The inputs will come from outside the system (hostname/ip address).

Each plugin will consume one input at a time, and return the output (exposed or not, and if exposed, metadata about the exposure). Metadata will mean different things in different contexts. For example, in the elasticsearch instance, exposure would mean that there are indexes which can be read without authentication. The metadata would be the name of those indexes and the number of documents present therein. For FTP metadata might mean a recursive directory listing.

Outputs of modules should be in a standardized format. Create a directory for the project, show me the architecture plan.

Cybersecurity Datasets:

gfek/Real-CyberSecurity-Datasets: Public datasets to help you address various cyber security problems.

BNN-UPC/NetworkModelingDatasets: This repository contains datasets for network modeling simulated with OMNet++

ericyoc/synthetic_network_traffic_simulation_poc: A simulation of network traffic using synthetic network traffic for 802.11, 3G GSM, 4G LTE, and 5G NR

Endpoint telemetry datasets

ScarredMonk/SysmonSimulator: Sysmon event simulation utility which can be used to simulate the attacks to generate the Sysmon Event logs for testing the EDR detections and correlation rules by Blue teams.

tsale/EDR-Telemetry: This project aims to compare and evaluate the telemetry of various EDR products.

Beginner Friendly AI/ML Cybersecurity Models:

Captchas

Captcha Solver – CNN - Cloud Hosted Option and its accompanying blog Solving CAPTCHAs with Convolutional Neural Networks | by Matheus Ramos Parracho | Medium

CNN CAPTCHA Solver - 97.8% Accuracy - Cloud Hosted Option

Solving CAPTCHAs with Convolutional Neural Networks

Network Threat Detection / Anomaly Detection / Intrusion Analysis

How to do Anomaly Detection using Machine Learning in Python?

Intrusion Detection System with ML&DL - machine learning and deep learning concepts (used in workshop)

Network Traffic Anomaly Detection - deep learning model

yasakrami/Threat-Detection-in-Cyber-Security-Using-AI Using PCAP files

Spam vs Ham (and learning about unbalanced data sets)

HAM vs SPAM Email Classifier (CountVect & TF-IDF)

More advanced:

Anomaly Detection in Network Traffic with K-means clustering.ipynb at master · lucabenedetto/Algorithmic-Machine-Learning - indepth, covers different ways to find outliers and anomalies using supervised and unsupervised machine learning

Language models:

llama-recipes/recipes/quickstart at main · meta-llama/llama-recipes

Cybersecurity Domain-Specific Language Model

SynamicTechnologies/CYBERT · Hugging Face

ehsanaghaei/SecureBERT: SecureBERT is a domain-specific language model to represent cybersecurity textual data. and ehsanaghaei/SecureBERT · Hugging Face

markusbayer/CySecBERT · Hugging Face

Generative AI and LLMs for the Cybersecurity Professional:

Gemma for presentations and websites - never do another PowerPoint from scratch! You can also import your company's template

Prompt Engineering | Lil'Log - Getting started understanding prompt engineering

https://microsoft.github.io/prompt-engineering/ - Prompt Engineering for Code

https://github.com/promptslab/Awesome-Prompt-Engineering?tab=readme-ov-file#tools--code - Prompt Engineering

https://library.easyprompt.xyz/?via=topaitools - Prompt Library

https://cloud.google.com/blog/topics/threat-intelligence/ai-nist-nice-prompt-library-gemini NIST NICE Prompt Library

https://github.com/Billy1900/Awesome-AI-for-cybersecurity - Large Collection of AI for Cybersecurity

https://github.com/DummyKitty/Cyber-Security-chatGPT-prompt - Cybersecurity prompt library

https://github.com/fr0gger/Awesome-GPT-Agents - Collection of GPT Agents

https://chatgpt.com/g/g-2DQzU5UZl-code-copilot - Code CoPilot GPT

https://chatgpt.com/g/g-jBdvgesNC-diagrams-flowcharts-mindmaps - Flowcharts, diagrams and mindmap generator

https://github.com/tenable/awesome-llm-cybersecurity-tools - LLM Cybersecurity Tools

https://github.com/JusticeRage/Gepetto - for use with IDA Pro for malware reverse engineering assistance

https://github.com/s0md3v/SubGPT - subdomain enumeration

https://chatgpt.com/g/g-IZ6k3S4Zs-mitregpt - MITRE ATT&CK mapping

https://github.com/Mooler0410/LLMsPracticalGuide - Practical Guide to using LLMs

Anaconda and Data Science:

An End-to-end Data Science Project with Anaconda Assistant | Anaconda

Basic ML and Deep Learning Concepts:

Friendly Machine Learning: Linear Regression and Multiple Line Regression

5 Types of Neural Networks: An Essential Guide for Analysts

Neural Networks: Solving Complex Science Problems

Convolutional Neural Network | Deep Learning | Developers Breach

Cats and Captchas

Google's Artificial Brain Learns to Find Cat Videos | WIRED and their officical paper [1112.6209] Building high-level features using large scale unsupervised learning

How to Classify Photos of Dogs and Cats (with 97% accuracy) - MachineLearningMastery.com

Dogs vs Cats Keras CNN image classifier.ipynb - Download Option - Available in Workshop folder sourced from from Github: mohamedamine99/Keras-CNN-cats-vs-dogs-image-classification

Cat & Dog Classification using Convolutional Neural Network in Python - GeeksforGeeks for use with Download Option

Image classification from scratch- Cloud Hosted Option

Cats or Dogs - using CNN with Transfer Learning- Cloud Hosted Option

Building a Cat Detector using Convolutional Neural Networks — TensorFlow for Hackers (Part III) | by Venelin Valkov | Medium

Cats vs Dogs - Part 1 - 92.8% Accuracy - Binary Image Classification with Keras and Deep Learning

Captcha Solver – CNN - Cloud Hosted Option and its accompanying blog Solving CAPTCHAs with Convolutional Neural Networks | by Matheus Ramos Parracho | Medium

CNN CAPTCHA Solver - 97.8% Accuracy - Cloud Hosted Option

Solving CAPTCHAs with Convolutional Neural Networks

Word Embeddings

Word Embedding Demo: Tutorial

Embeddings 101: The foundation of large language models

Data Science Process

Data Science Process: A Beginner’s Guide in Plain English

Introduction to Data Science from Anaconda

Data Science Process: 7 Steps With Comprehensive Case Study

RAG

Building a RAG Application in 10 min with Claude 3 and Hugging Face | Medium