Supervised Machine Learning Training

Jupyter notebooks created for a supervised machine learning training exercise, using an e-mail data set with some phishing examples.

Both examples use Scikit-learn.

Files

Example 01 - Feature Classification

01 - ML - Feature Classification.ipynb
Objective: To automatically flag phishing messages using selected features (individual measurable properties) of an e-mail corpus.

Example 02 - Text Classification (Natural Language Processing)

02 - ML - Text Classification.ipynb
Objective: To automatically flag phishing messages using their content.

Run

Both notebook's can be run using the docker image jupyter/scipy-notebook.

Install docker and pull image:

apt update && apt install -y docker.io
systemctl enable docker --now
docker pull jupyter/scipy-notebook

Clone project and run the docker image:

git clone https://github.com/isabellecda/supervised-ml-training.git
cd supervised-ml-training
docker run -p 8888:8888 -v $(pwd):/home/jovyan/work jupyter/scipy-notebook

Disclaimer

The notebooks' structure was based on Jose Portilla's 'Natural Language with Python' course.
The data set was created using a custom fork of Diego Ocampo's Machine Learning Phishing project.
These are purely didactic notebooks and their content is free to use, just remember to add a link to the repository as a reference.