Implementations of phishing detection and identification baselines
-
EMD: Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311. This paper uses Earth Mover Distance to detect the similarity between two webpage screenshots.
-
Phishzoo: Afroz, S., & Greenstadt, R. (2011, September). Phishzoo: Detecting phishing websites by looking at them. In 2011 IEEE fifth international conference on semantic computing (pp. 368-375). IEEE. This work applies SIFT algorithm to quantify the similarity between two webpage screenshots.
-
VisualPhishnet: Abdelnabi, S., Krombholz, K., & Fritz, M. (2020, October). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 1681-1698). This work trains deep learning Siamese model to compare two webpage screenshots.
-
StackModel: Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27-39.
-
URLNet: Le, H., Pham, Q., Sahoo, D., & Hoi, S. C. (2018). URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162.
python == 3.6
opencv-python == 3.4.2.17
opencv-contrib-python == 3.4.2.17
tensorflow == 1.13.1
The data folder should be organized in this format
cd EMD/
python emd.py -f [path_to_data_folder] \
-m [benign|phish] # testing mode, which is the ground-truth label for the folder \
-t [path_to_targetlist_folder]
cd PhishZoo/
python phishzoo.py -f [path_to_data_folder] \
-m [benign|phish] # testing mode, which is the ground-truth label for the folder \
-t [path_to_targetlist_folder]
For VisualPhishnet (Fork from https://github.com/S-Abdelnabi/VisualPhishNet.git)
Download pretrained model here, Target list embedding, Targetlist labels, Targetlist filename list
cd VisualPhishnet/
python visualphish_manual.py -f [path_to_data_folder] \
-r [txt_path_to_save_result]
Download pretrained model here
cd StackModel
python test.py -f [path_to_data_folder] \
-o [directory_to_save_output]
For URLNet (Fork from https://github.com/Antimalweb/URLNet)
Download pretrained model here
python test.py \
--model.emb_mode 5 \
--data.data_dir [path_to_data_folder] \
--log.checkpoint_dir output_5/checkpoints/model-2430 \
--log.output_dir [txt_path_to_save_result] \
--data.word_dict_dir output_5/words_dict.p \
--data.char_dict_dir output_5/chars_dict.p \
--data.subword_dict_dir output_5/subwords_dict.p