/PhishingBaseline

Implementations of 3 phishing detection and identification baselines

Primary LanguagePython

Phishing baseline

Implementations of phishing detection and identification baselines

  • EMD: Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311. This paper uses Earth Mover Distance to detect the similarity between two webpage screenshots.

  • Phishzoo: Afroz, S., & Greenstadt, R. (2011, September). Phishzoo: Detecting phishing websites by looking at them. In 2011 IEEE fifth international conference on semantic computing (pp. 368-375). IEEE. This work applies SIFT algorithm to quantify the similarity between two webpage screenshots.

  • VisualPhishnet: Abdelnabi, S., Krombholz, K., & Fritz, M. (2020, October). VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (pp. 1681-1698). This work trains deep learning Siamese model to compare two webpage screenshots.

  • StackModel: Li, Y., Yang, Z., Chen, X., Yuan, H., & Liu, W. (2019). A stacking model using URL and HTML features for phishing webpage detection. Future Generation Computer Systems, 94, 27-39.

  • URLNet: Le, H., Pham, Q., Sahoo, D., & Hoi, S. C. (2018). URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv preprint arXiv:1802.03162.

Requirements

python == 3.6
opencv-python == 3.4.2.17
opencv-contrib-python == 3.4.2.17
tensorflow == 1.13.1

Instructions

The data folder should be organized in this format

To run EMD

cd EMD/ 
python emd.py -f [path_to_data_folder] \
             -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
             -t [path_to_targetlist_folder]

To run PhishZoo

cd PhishZoo/
python phishzoo.py -f [path_to_data_folder] \
                   -m [benign|phish] # testing mode, which is the ground-truth label for the folder \
                   -t [path_to_targetlist_folder]

Download pretrained model here, Target list embedding, Targetlist labels, Targetlist filename list

cd VisualPhishnet/
python visualphish_manual.py -f [path_to_data_folder] \
                             -r [txt_path_to_save_result]

For StackModel

Download pretrained model here

cd StackModel
python test.py -f [path_to_data_folder] \
               -o [directory_to_save_output]

Download pretrained model here

python test.py \
  --model.emb_mode 5 \
  --data.data_dir [path_to_data_folder] \
  --log.checkpoint_dir output_5/checkpoints/model-2430 \
  --log.output_dir [txt_path_to_save_result] \
  --data.word_dict_dir output_5/words_dict.p \
  --data.char_dict_dir output_5/chars_dict.p \
  --data.subword_dict_dir output_5/subwords_dict.p