/flex-anomalies

Anomaly detection using Federated Learning with FLEX.

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

flex-anomalies

flex-anomalies is a Python library dedicated to anomaly detection in machine learning. It offers a wide range of algorithms and techniques, including models based on distance, density, trees, and neural networks such as convolutional and recurrent architectures. The library also provides aggregators, anomaly score processing techniques, and pre-processing techniques for data.

Anomaly detection involves examining data and detecting deviations or anomalies present in the data, with the goal of purifying data sets and identifying anomalies for further analysis.

Details

Anomaly Detection with FLEXible Federated Learning: This repository contains implementations of anomaly detection algorithms using the Flexible Federated Learning library. FLEXible is a Python library for realizing federated learning in an efficient and scalable manner. From the study of state-of-the-art research works on federated learning for network intrusion detection.

This repository also includes:

  • An organized folder structure that makes it easy to navigate and understand the project.
  • Explanatory notebooks showing practical examples and detailed explanations for the use of the library.

Folder structure

  • flexanomalies/pool: Here are the aggregators and primitives for each of the models following the FLEXible structure.
  • flexanomalies/utils: Contains the source code of the implementations of the anomaly detection algorithms, anomaly score processing techniques, metrics for the evaluation, function to federate a centralized dataset using FLEXible and data loading.
  • flexanomalies/datasets: some pre-processing techniques for data.
  • notebooks: Contains explanatory notebooks showing how to use the anomaly detection algorithms on data.

Explanatory Notebooks

  • AnomalyDetection_Autoencoder_FLEX.ipynb: A notebook showing a step-by-step example of how to use Auto Encoder model for anomaly detection with federated learning for static data.
  • AnomalyDetection_AutoEncoder_FLEX_ts.ipynb: Notebook showing a step-by-step example of how to use the Auto Encoder model for anomaly detection with federated learning for time series.The structure of the sliding window, data federation, federated training and model evaluation at the server and client level.
  • AnomalyDetection_PCA_FLEX.ipynb: A notebook demonstrating the application of PCA_Anomaly for anomaly detection with federated learning for a static dataset.
  • AnomalyDetection_Cluster_FLEX.ipynb: Notebook showing a step-by-step example of how to use the ClusterAnomaly model for anomaly detection with federated learning for static data and evaluating the model on test sets.
  • AnomalyDetection_IsolationForest_FLEX.ipynb: Notebook showing an example of how to use the IsolationForest model with federated learning for an example set of static data. From data federation and training to model evaluation on a test set.
  • AnomalyDetection_CNNN_LSTM_FLEX_ts.ipynb: Notebook showing the use of the DeepCNN_LSTM model with federated learning for anomaly detection in time series. The structure of the sliding window, data federation, federated training and model evaluation at server and client level.

Features

For more information on the implemented algorithms see the table that follows:

Models Description Citation
IsolationForest Algorithm for data anomaly detection, detects anomalies using binary trees. Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In *International Conference on Data Mining*\ , pp. 413-422. IEEE.
PCA_Anomaly Principal component analysis (PCA), algorithm for detecting outlier.Outlier scores can be obtained as the sum of weighted euclidean distance between each sample to the hyperplane constructed by the selected eigenvectors Shyu, M.L., Chen, S.C., Sarinnapakorn, K. and Chang, L., 2003. A novel anomaly detection scheme based on principal component classifier. *MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING*.
ClusterAnomaly Model based on clustering. Outliers scores are solely computed based on their distance to the closest large cluster center, kMeans is used for clustering algorithm. Chawla, S., & Gionis, A. (2013, May). k-means–: A unified approach to clustering and outlier detection. In Proceedings of the 2013 SIAM international conference on data mining (pp. 189-197).
DeepCNN_LSTM Neural network model for time series and static data including convolutional and recurrent architecture. Aguilera-Martos, I., García-Vico, Á. M., Luengo, J., Damas, S., Melero, F. J., Valle-Alonso, J. J., & Herrera, F. (2022). TSFEDL: A Python Library for Time Series Spatio-Temporal Feature Extraction and Prediction using Deep Learning (with Appendices on Detailed Network Architectures and Experimental Cases of Study). arXiv preprint arXiv:2206.03179.
AutoEncoder Fully connected AutoEncoder for time series and static data. Neural network for learning useful data representations unsupervisedly. detect anomalies in the data by calculating the reconstruction. Aggarwal, C.C., 2015. Outlier analysis. In Data mining (pp. 237-263), Ch.3. Springer, Cham. Ch.3

Installation

FLEX-Anomalies is available on the PyPi repository and can be easily installed using:

pip: pip install flexanomalies

Install the necessary dependencies:

pip install -r requirements.txt

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this repository in your research work, please cite the Flexible paper: