This master thesis aims to develop an architecture for automated heuristic phishing detection. The outcome of this approach is to evaluate phishing URLs using an automated solution.
Repository contains the automated detection of phishing URLs, the automated detection is realized by the implementation of Java Web Server that uses ChromeDriver and Selenium testing framework, malicious URLs are in CSV datasets, heuristic detection aims to test phishing URLs during website surfing, I've implemented web browser extension that can test malicious URL per URL with the usage of automation and usage Selenium testing framework, the browser extension contains the predict() functions that contains twenty methods for testing purposes to identify if the proper URL contains some specific characteristics for phishing URLS, every function has some weight that is calculated in final weight to identify, if it is phishing URL or not.
I've also created a not automated version when you can test manually, with the same Java Web Sever implementation mentioned above URLS but without usage of large datasets and you can safely test in ChromeDriver environment exact URLs by static definition in code.
Testing fuctions to detect phishing URLs with the possible returning values, the weights for proper function are defined in JavaScript functions predict(data,weight) or predict2(data,weight) in the array weight=[...];
IsIPv4() (-1|+1)
isHttps() (-1|+1)
isTildeInURL() (-1|+1)
isHashTaginURL() (-1|+1)
isLongURL(<54,54-75,75>) (-1|0|+1)
isTinyURL() (-1|+1)
isRedirectingURL() (-1|+1)
isHypenURL() (-1|+1)
isMultiDomain() (-1|+1)
isFaviconDomainUnidentical() (-1|+1)
isIllegalHttpsURL() (-1|+1)
isImgFromDifferentDomain() (-1|0|+1)
isAnchorFromDifferentDomain() (-1|0|+1)
isScLnkFromDifferentDomain() (-1|0|+1)
isFormActionInvalid() (-1|0|+1)
isMailToAvailable() (-1|+1)
isIframePresent() (-1|+1)
getIdenticalDomainCount() (returns IdenticalDomainCount number)
Phishing URL web detection based on weight implemented in web browser extensions:
This repository contains all source codes for the master thesis.
Directory logic/
contains the source code for 1st
version of extension for web browser Google Chrome.
There are proper .crx file
and private key
in .pem,
directory also contains all logic to testing phishing URLs
addressess (20 functions), target/
directory contains
documentation for Java Spring Boot project, documentation
is actually pre-generated or you can compile it manually by
using Javadoc. Directory logic/
contains all functionality in
the class logic.js and the you can collect Web Browser log using
DevTools Console.
Directory Package/
contains the source code for 2nd
version of extension for web browser Google Chrome.
There are proper .crx file
and private key
in .pem,
directory also contains all logic to testing phishing URLs
addressess (20 functions), target/
directory contains
documentation for Java Spring Boot project, documentation
is actually pre-generated or you can compile it manually by
using Javadoc. Directory Package/
contains all functionality in
the class extensionListener.js, logic.js is not implemented,
afterwards you can collect logs of web browser extension in the
extension page in the properties of proper extension -> Inspect Views -> backgroud page
Directory project/
ccontains the Web Sever in Java Spring Boot, that enabled testing phihsing URLs where you can define to test 1 URL address
or you can define more url addresses with variables and pass it to the Selenium WebDriver.get() method to test specic amount of phishing URL adressess by static definition in code.
Directory Automated/
contains the Web Sever in Java Spring Boot,
that enables also testing but in this folder testing is testing with CSV file,
where are stored phishing URL addresses and there is and 10s delay between
testing URL address per URL address. The main class is named RemoteDriverConfig.java that contains logic how to
test a dataset of phishing URLs and also contains logic how to manipulate with Selenium testing framework and how to use Chrome DevTools. Reading proper phishing URLs is realized by the loop iteration and then by using List we're able to get URL by URL (element per element from List) to the Webdriver.get() function and then we can examine URL with our web browser phishing detection extension.
IntelliJ Idea Ultimate or any other IDE that supports Spring Boot
Java JDK 17
Maven v3.8.5
Node JS 16.14.2
ChromeDriver (You need to define absolute path for ChromeDriver.exe in public class RemoteDriverConfig)
Documentations for web browser extenstions is in every specific folder that was mentioned in text higher.
Documentation for every web browser extension is in folder jsdoc/
Open terminal and open Visual Studio Code for any other text editor and then update some comments in source code.
cd jsdoc/out/scripts/; code .
Open second terminal and go to the proper directory:
cd jsdoc/out/scripts/
Regenerate HTML pages after some changes and choose file that you have updated:
for example:
jsdoc doc.json extenstionListener.js
or
jsdoc doc.json logic2.js
jsdoc/out/scripts/out/
In this folder are generated HTML pages with documentation.
Documentation is in:
project/target/site/apidocs/
& Automateds/target/site/apidocs/
cd Automated/
|| cd project/
And then is needed to pass maven command to generate javadoc:
mvn javadoc:javadoc
The documentation is then generated in thid directory project/target/site/apidocs/