/cern-openlab

Anomaly Detection - Research Project as part of the CERN Openlab Program, 2017

Primary LanguageJupyter NotebookMIT LicenseMIT

CERN Openlab Prototype: Outlier Detection

Running the Code

  • Clone the repository using git clone and run the Jupyter Notebooks locally

  • The functions and method have been documented but you can refer to my talk to understand more about the work : https://indico.cern.ch/event/635481/contributions/2685048/

  • The provided data slices have now been redacted for security purposes; however obscured data can be provided if you would like to test the code. Simply send your requirements via email to : swapneel.mehta@djsce.edu.in

Structure

  • The Jupyter Notebooks stored in the src directory are named according to the models each one utilises to analyse the data
  • There are some pre-determined anomalies observed when using OCSVM as well as IsolationForests stored in the txt file inside the anomalies directory; these can be used as reference when attempting to replicate the results
  • The folders old and v1 have been kept in the repository for purposes of historical reference and do not impact the working of the notebooks in any manner
  • Python scripts (knn.py and others) within the src directory have also been stored for purpose of historical reference

The most recent version of the code will be released following completion of final formalities for publication (Jan-Feb 2018). If you should require it for reference purposes, please contact me by email at swapneel.mehta@djsce.edu.in

Notes

  • This repository provides some proof-of-concept work released as a precursor to the final models developed as part of the Openlab Summer Student Program at CERN
  • We use scikit-learn to efficiently analyse slices of the data pulled from the "data lake" stored both on HDFS as well as Elastcsearch
  • This is a proof-of-concept deployment for testing our ideas for anomaly detection in network connection logs