/phishing-webpage-detection

Making use of machine learning to detect phishing webpage.

Primary LanguagePython

Phishing Webpage Detection

Build Status

This project makes use of machine learning to detect phishing webpage. Working in progress, early stage.

Environment

Development

  1. Python 3.6
  2. Visual Studio Code
  3. Mac OS Catalina

Milestones

Functions for Features Extraction

URL and Domain based

  1. Does the domain contain non-ASCII characters?
  2. Does the URL using an URL shortening service?
  3. Does the URL have deep level of subdomain?
  4. Does the URL have low Alexa rank?
  5. Is the domain not indexed by Google?

Code based

  1. Is the URL redirecting to other domain?
  2. Does the URL use many external resources?
  3. Does the URL open new windows?
  4. Does the URL block right clicks?
  5. Does the URL use inception bar? (Ref)

Content based (Future)

Generate Small Data Set

Fetch URLs from PhishTank

A script is written to fetch phish URLs and non-phish URLs. To execute it, go to the project root directory and execute

python3 fetch_data.py

By default, it fetch 100 phish URLs and 100 non-phish URLs. This can be modified in the saveUrls() function.

Extract Features and Generate Dataset

Better execute in virtual machine because it opens those phishing webpages. Another script is written to do the features extraction and generate the CSV file. To execute it, go to the project root directory and execute

python3 generate_dataset.py

Simple Machine Learning (In Progress)

At the current stage, the following algorithms are used for machine learning.

  • Logistic Regression
  • Decision Tree
  • Random Forest
python3 machine_learn.py

Unit Tests

Unit tests are written to test specific modules / functions. To execute tests, go to the project root directory and execute

python3 -m unittest