/anomaly-detection-resources

Anomaly detection related books, papers, videos and toolboxes

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Anomaly Detection Learning Resources


Outlier Detection , also known as Anomaly Detection is a fascinating and useful technique to identify outlying data objects. It has been proven critical in many fields, such as credit card fraud analytics and mechanical unit defect detection.

In this repository, you will find many:

  1. Books & Academic Papers
  2. Learning Materials, e.g, online courses and videos
  3. Outlier Datasets
  4. Outlier Detection Libraries & Demo Codes
  5. Paper Downloader: a Python 3 script to download the open access papers listed in this repository (under development).

I would continue adding more items to the repository. Please feel free to suggest some key materials by opening an issue or dropping me an email @(yuezhao@cs.toronto.edu). Enjoy reading!

1. Books & Tutorials

1.1. Books

Outlier Analysis by Charu Aggarwal: Classical text book covering most of the outlier analysis techniques. A must-read for people in outlier detection. [Preview.pdf]

Outlier Ensembles: An Introduction by Charu Aggarwal and Saket Sathe: Great intro book for ensemble learning in outlier anaysis.

Data Mining: Concepts and Techniques (3rd) by Jiawei Han Micheline Kamber Jian Pei: Chapter 12 discusses outlier detection with many important points. [Google Search]

1.2. Tutorials

Kriegel, H.P., Kröger, P. and Zimek, A., 2010. Outlier detection techniques. Tutorial at ACM SIGKDD, 10. [Download PDF]

Chawla, S. and Chandola, V., 2011, Anomaly Detection: A Tutorial. Tutorial at ICDM 2011. [Download PDF]

Lazarevic, A., Banerjee, A., Chandola, V., Kumar, V. and Srivastava, J., 2008, September. Data mining for anomaly detection. In Tutorial at ECML PKDD 2008. [See Video]

2. Courses/Seminars/Videos

Coursera Introduction to Anomaly Detection (by IBM): https://www.coursera.org/learn/ai/lecture/ASPv0/introduction-to-anomaly-detection

Coursera Machine Learning by Andrew Ng also partly covers the topic:

Udemy Outlier Detection Algorithms in Data Mining and Data Science: https://www.udemy.com/outlier-detection-techniques/

Stanford Data Mining for Cyber Security also covers part of anomaly detection techniques. http://web.stanford.edu/class/cs259d/

3. Toolbox & Datasets

3.1. Python

Scikit-learn Novelty and Outlier Detection. It supports some popular algorithms like LOF, Isolation Forest and One-class SVM

Python Outlier Detection (PyOD): It supports a series of outlier detection algorithms and combination frameworks. It is now released on PyPI and can be installed with "pip install pyod".

3.2. Matlab

Anomaly Detection Toolbox - Beta: A collection of popular outlier detection algorithms in Matlab.

3.3. Java

ELKI: Environment for Developing KDD-Applications Supported by Index-Structures: ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.

RapidMiner Anomaly Detection Extension: The Anomaly Detection Extension for RapidMiner comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets. It allows you to find data, which is significantly different from the normal, without the need for the data being labeled.

3.4. Time series outlier detection

3.5. Datasets

ELKI Outlier Datasets: https://elki-project.github.io/datasets/outlier

Outlier Detection DataSets (ODDS): http://odds.cs.stonybrook.edu/#table1

Unsupervised Anomaly Detection Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF

Anomaly Detection Meta-Analysis Benchmarks: https://ir.library.oregonstate.edu/concern/datasets/47429f155

4. Papers

4.1. Overview & Survey Papers

Chandola, V., Banerjee, A. and Kumar, V., 2009. Anomaly detection: A survey. ACM computing surveys , 41(3), p.15. [Download PDF]

Hodge, V. and Austin, J., 2004. A survey of outlier detection methodologies. Artificial intelligence review, 22(2), pp.85-126. [Download PDF]

Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E., 2016. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), pp.891-927. [HTML] [SLIDES]

Singh, K., & Upadhyaya, S. (2012). Outlier detection: applications and techniques. International Journal of Computer Science Issues (IJCSI), 9(1), 307. [Download PDF]

Goldstein, M. and Uchida, S., 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one, 11(4), p.e0152173. [Download PDF]

4.2. Key Algorithms

k Nearst Neighbors (kNN) Outlier Detector.

  • Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod Record, 29(2), pp. 427-438). [Download PDF]
  • Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery pp. 15-27. [HTML]

Local Outlier Factor (LOF). Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record, 29(2), pp. 93-104. [Download PDF]

Isolation Forest. Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In ICDM '08, pp. 413-422. IEEE. [Download PDF]

One-class Support Vector Machine. Ma, J. and Perkins, S., 2003, July. Time-series novelty detection using one-class support vector machines. In IJCNN' 03, pp. 1741-1745. IEEE. [Download PDF]

4.3. Graph & Network Outlier Detection

Akoglu, L., Tong, H. and Koutra, D., 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3), pp.626-688. [Download PDF]

4.4. Time Series Outlier Detection

Gupta, M., Gao, J., Aggarwal, C.C. and Han, J., 2014. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), pp.2250-2267. [Download PDF]

4.5. Spatial Outliers

4.6. High-dimensional & Subspace Outliers

Zimek, A., Schubert, E. and Kriegel, H.P., 2012. A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), pp.363-387. [Downloadable Link]

4.7. Outlier Ensembles

Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), pp.49-58. [Download PDF]

Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22. [Download PDF]

Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham. [HTML]

4.8. Outlier Detection in Evolving Data

Salehi, Mahsa & Rashidi, Lida. (2018). A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]. ACM SIGKDD Explorations Newsletter. 20. 13-23. [Download PDF]

Emaad Manzoor, Hemank Lamba, Leman Akoglu. Outlier Detection in Feature-Evolving Data Streams. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [Download PDF] [Github]

4.9. Representation Learning in Outlier Detection

Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [Download PDF]

Micenková, B., McWilliams, B. and Assent, I., 2015. Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104. [Download PDF]

Zhao, Y., Hryniewicki, M.K. and PricewaterhouseCoopers, A., 2018. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. International Joint Conference on Neural Networks. [Download PDF]

4.10. Interpretability

Liu, N., Shin, D. and Hu, X., 2017. Contextual outlier interpretation. arXiv preprint arXiv:1711.10589. [Download PDF]

Tang, G., Pei, J., Bailey, J. and Dong, G., 2015. Mining multidimensional contextual outliers from categorical relational data. Intelligent Data Analysis, 19(5), pp.1171-1192. [Download PDF]

Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos. Beyond Outlier Detection: LookOut for Pictorial Explanation. ECML PKDD 2018. [Download PDF]

4.11. Social Media Anomaly Detection

Yu, R., Qiu, H., Wen, Z., Lin, C. and Liu, Y., 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter, 18(1), pp.1-14. [Download PDF]

5. Key Conferences/Workshops/Journals

5.1. Conferences & Workshops

ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) Note: SIGKDD usually has an Outlier Detection Workshop (ODD), see ODD 2018.

ACM International Conference on Management of Data (SIGMOD)

The Web Conference (WWW)

IEEE International Conference on Data Mining (ICDM)

SIAM International Conference on Data Mining (SDM)

IEEE International Conference on Data Engineering (ICDE)

ACM InternationalConference on Information and Knowledge Management (CIKM)

ACM International Conference on Web Search and Data Mining(WSDM)

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)

The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

5.2. Journals

ACM Transactions on Knowledge Discovery from Data (TKDD)

IEEE Transactions on Knowledge and Data Engineering (TKDE)

ACM SIGKDD Explorations Newsletter

Data Mining and Knowledge Discovery

Knowledge and Information Systems (KAIS)