/awesome-ml-for-threat-detection

A curated list of resources to deep dive into the intersection of applied machine learning and threat detection.

Awesome

Awesome ML for Threat Detection

A curated list of resources to deep dive into the intersection of applied machine learning and threat detection.

Table of Contents

Threat detection papers

  • Malicious URL Detection using Machine Learning: A Survey. Doyen Sahoo, Chenghao Liu and Steven C.H. Hoi. arXiv, 2017. [PDF]
  • SoK: Applying Machine Learning in Security - A Survey. Heju Jiang, Jasvir Nagra, Parvez Ahammad. arXiv, 2016. [PDF]
  • Predicting Domain Generation Algorithms with Long Short-Term Memory Networks. Jonathan Woodbridge, Hyrum S. Anderson, Anjum Ahuja and Daniel Grant. arXiv, 2016. [PDF]
  • Network connectivity graph for malicious traffic dissection. Enrico Bocchi, Luigi Grimaudo, Marco Mellia, Elena Baralis, Sabyasachi Saha, Stanislav Miskovic, Gaspar Modelo-Howard, Sung-Ju Lee. 24th International Conference on Computer Communication and Networks (ICCCN), 2015. [PDF]
  • Detecting malicious domains via graph inference. Pratyusa K. Manadhata, Sandeep Yadav, Prasad Rao, William Horne. ACM Conference on Computer and Communications Security, 2014. [PDF]
  • Nazca: Detecting Malware Distribution in Large-Scale Networks. Luca Invernizzi, Stanislav Miskovic, Ruben Torres, Sabyasachi Saha, Sung-ju Lee, Marco Mellia, Christopher Kruegel and Giovanni Vigna. NDSS Symposium, 2014. [PDF]
  • Machine learning for identifying botnet network traffic. Matija Stevanovic and Jens Myrup Pedersen. Aalborg University (Technical report), 2013. [PDF]
  • Survey on network‐based botnet detection methods. Sebastián García, Alejandro Zunino and Marcelo Campo. Security and Communication Networks, 2013. [PDF]
  • Detecting insider threats in a real corporate database of computer usage activity. Ted E. Senator et al. 19th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD), 2013. [PDF]
  • Botnet detection based on traffic behavior analysis and flow intervals. David Zhao, Issa Traore, Bassam Sayed, Wei Lu, Sherif Saad, Ali Ghorbani, Dan Garant. Computers & Security, 2013. [PDF]

Threat characterization papers

  • A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems. Hanan Hindy, David Brosset, Ethan Bayne, Amar Seeam, Christos Tachtatzis, Robert Atkinson, Xavier Bellekens. IEEE Access, 2020. [PDF]
  • A lustrum of malware network communication: Evolution and insights. Chaz Lever, Platon Kotzias, Davide Balzarotti, Juan Caballero and Manos Antonakakis. IEEE Symposium on Security and Privacy, 2017. [PDF]
  • A comprehensive measurement study of domain generating malware. Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, Elmar Gerhards-Padilla. 25th USENIX Security Symposium, 2016. [PDF]
  • A Survey on Botnet Architectures, Detection and Defences. Muhammad Mahmoud, Manjinder Nir and Ashraf Matrawy. International Journal of Network Security, 2015. [PDF]
  • Practical Comprehensive Bounds on Surreptitious Communication over DNS. Vern Paxson, Mihai Christodorescu, Mobin Javed, Josyula Rao, Reiner Sailer, Douglas Lee Schales, and Marc Ph. Stoecklin, Kurt Thomas, Wietse Venema and Nicholas Weaver. 22nd USENIX Security Symposium, 2013. [PDF]
  • Analysis of security data from a large computing organization. A. Sharma, Z. Kalbarczyk, J. Barlow and R. Iyer. IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011. [PDF]

Machine learning systems and operationalization papers

  • A survey of methods for explaining black box models. Riccardo Guidotti profile imageRiccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, Dino Pedreschi. ACM Computing Surveys, 2018. [PDF]
  • Hidden Technical Debt in Machine Learning Systems. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison. Advances in Neural Information Processing Systems (NIPS), 2015. [PDF]
  • Local Outlier Detection with Interpretation. Xuan Hong Dang, Barbora Micenková, Ira Assent and Raymond T. Ng. European Conference on Machine Learning and Knowledge Discovery in Databases, 2013. [PDF]
  • Interpreting and unifying outlier scores. Hans-Peter Kriegel, Peer Kroger, Erich Schubert and Arthur Zimek. SIAM International Conference on Data Mining, 2011. [PDF]
  • Outside the Closed World: On Using Machine Learning for Network Intrusion Detection. Robin Sommer and Vern Paxson. IEEE Symposium on Security and Privacy, 2010. [PDF]
  • Converting output scores from outlier detection algorithms into probability estimates. Jing Gao and Pang-ning Tan. International Conference on Data Mining (ICDM), 2006. [PDF]

PatternEx papers

  • The Holy Grail of “Systems for Machine Learning”: Teaming humans and machine learning for detecting cyber threats. Ignacio Arnaldo and Kalyan Veeramachaneni. ACM SIGKDD Explorations Newsletter 21, 2019. [PDF]
  • Shooting the moving target: machine learning in cybersecurity. Ankit Arun and Ignacio Arnaldo. USENIX Conference on Operational Machine Learning (OpML), 2019. [PDF]
  • eX2: a framework for interactive anomaly detection. Ignacio Arnaldo, Kalyan Veeramachaneni, Mei Lam. Intelligent User Interfaces Workshops, 2019. [PDF]
  • Acquire, adapt, and anticipate: continuous learning to block malicious domains. Ignacio Arnaldo, Ankit Arun, Sumeeth Kyathanahalli, Kalyan Veeramachaneni. IEEE international conference on Big Data, 2018. [IEEE Link]
  • Learning representations for log data in cybersecurity. Ignacio Arnaldo, Alfredo Cuesta-Infante, Ankit Arun, Mei Lam, Costas Bassias and Kalyan Veeramachaneni. International Conference on Cyber Security Cryptography and Machine Learning, 2017. [PDF]
  • AI2: Training a Big Data Machine to Defend. Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias and Ke Li. 2nd IEEE International Conference on Big Data Security on Cloud, 2016. [PDF]

Other machine learning for cybersecurity repos

Note

The intial intent was to create a repo pointing to our own papers only (PatternEx papers) but we thought it made sense to also include papers that shaped our understanding of this space, enjoy!