/nafems-anomalous-time-series-augmentation

Notes and references for NAFEMS talk on augmenting anomalous time series datasets

MIT LicenseMIT

NAFEMS Conference

Artificial Intelligence and Machine Learning for Manufacturing

NAFEMS Conference Page

Artificial Intelligence and Machine Learning for Manufacturing

 


Cover Slide

Multilayered Large Language Models Strategies for Generating Time Series Simulation Data

Augmenting Anomalous Time Series Data: from VAE, GANs, Transformers, Diffusion Models and Beyond

I. Abstract

Explore generative deep learning approaches to augment anomalous time series datasets like the Case Western Bearing Dataset in preparation to fine-tune LLM (foundational models) for anomaly detection.

II. Introduction and Motivation

A. Problem Statement

B. Investment and Market Growth: Growth of Synthetic Data

III. Bearing Fault Detection

1. Bearing Vibration Analysis

2. Frequency Transforms

  • A Survey on Deep Learning based Time Series Analysis with Frequency Transformation (15 Sep 2023) Arxiv Link

IV. Time Series Topics

A. Time Series and Signal Processing

  • (3 Feb 2023) Github Link A comprehensive survey on the time series papers from 2018-2022 (we will update it in time ASAP!) on the top conferences (NeurIPS, ICML, ICLR, SIGKDD, SIGIR, AAAI, IJCAI, WWW, CIKM, ICDM, WSDM, etc.)
  • Research Paper (14 Sep 2023) Github Link A professional list of Papers, Tutorials, and Surveys on AI for Time Series in top AI conferences and journals.
  • Papers, Libraries, Benchmarks (Feb 2023) Github Link A comprehensive survey on the time series domains
  • time-series-analysis · GitHub Topics
  • time-series · GitHub Topics
  • TS AI papers, tutorials and surveys (14 Sep 2023) Github Link A professionally curated list of papers (with available code), tutorials, and surveys on recent AI for Time Series Analysis (AI4TS), including Time Series, Spatio-Temporal Data, Event Data, Sequence Data, Temporal Point Processes, etc., at the Top AI Conferences and Journals, which is updated ASAP (the earliest time) once the accepted papers are announced in the corresponding top AI conferences/journals. Hope this list would be helpful for researchers and engineers who are interested in AI for Time Series Analysis.

B. Deep Learning Time Series Forecasting

  • TS Forecasting and DL (12 Sep 2023) Github Link Resources about time series forecasting and deep learning, as well as other resources like competitions, datasets, courses, blogs, code, etc.
  • (Aug 2023) Github Link OmniXAI: A Library for eXplainable AI
  • (24 Mar 2023) Towards Data Science Link XAI for Forecasting: Basis Expansion

C. Time Series Anomaly Detection

  • TS Anomaly Detection Resources: (6 Jun 2023) Github Link Anomaly detection related books, papers, videos, and toolboxes
  • (21 Sep 2022) Github Link List of tools & datasets for anomaly detection on time-series data.
  • (3 Jul 2023) Github Link Time-Series Anomaly Detection Comprehensive Benchmark This repository updates the comprehensive list of classic and state-of-the-art methods and datasets for Anomaly Decetion in Time-Series. This is part of an onging research at Time Series Analytics Lab, Monash University.
  • Time Series Anomaly Detection · GitHub Topics
  • (26 Dec 2021) Github Link with Colabs
  • Papers:
    • (4 Sep 2023) Github Link DiffAD Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models
    • (4 Sep 2023) Github Link MSTICPY MS Threat Intelligence Python Tools
    • (8 May 2023) Github Link luminol: Anomaly Detection and Correlation library
    • (9 May 2022) Github Link MTS Deep Learning Anomaly Detection A repository for code accompanying the manuscript 'An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series' (published at TNNLS)
    • (31 May 2020) Github Link ML and LSTM Outlier Detection This is project made for one of the subjects at Warsaw University of Technology. Its aim is to detect anomaly in time series data.

D. Time Series Synthesis

1. Statistical ML

2. Deep Learning

  • Kera Tutorials: Keras Link
  • Data Augmentation techniques in time series domain: a survey and taxonomy (24 Mar 2023) Springer Link
  • TTransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers (24 Jul 2023) Arxiv Link

V. Vibrational Datasets and Code

A. Public Data

1. Case Western Bearing Data

  • Case Western Bearing (5 Dec 2021): Github Link This repository contains data and code to recreate classification results for fault detection in ball bearings. The data comes from the Case Western Reserve Bearing Data Center
  • Github (17 Feb 2021): Github Link Case Western Reserve University Bearing Fault Dataset
  • Github Metadata for Python (14 Apr 2020): Github Link Collect Case Western Reserve University Bearing Data in python 3
  • Github (14 May 2022): Github Link Data-of-Case-Western-Reserve-University/Normal Baseline Data

2. Multivariate Time Series

  • M4 Dataset and Competition: Github Link
  • MTS Github Link
  • Anomaly Datasets (10 Sep 2023) Github Link ADRepository: Real-world anomaly detection datasets, including tabular data (categorical and numerical data), time series data, graph data, image data, and video data.
  • AI4I 2020 UCI PdM Dataset Link AI4I 2020 Predictive Maintenance Dataset - UCI Machine Learning Repository

Here is the continued outline with GitHub markdown preserving the URLs:

3. Bearing Vibrational Code

  • Rolling element bearing fault diagnosis using convolutional neural network and vibration image (sciencedirectassets.com) (2019) Hoang & Kang
  • [PDF] Fault Detection in Ball Bearings | Semantic Scholar (19 Sep 2022) Pickard & Moll
  • Case Western Bearing Colab EDA (5 Dec 2021): Github Link This repository contains data and code to recreate classification results for fault detection in ball bearings. The data comes from the Case Western Reserve Bearing Data Center
  • (27 Nov 2019) Github Link CNN applied to bearing signals for analysis
  • SB-PdM: a tool for predictive maintenance of rolling bearings based on limited labeled data (Feb 2023) Github Link
  • Vibration-Based Fault Diagnosis with Low Delay (26 Jan 2023) Github Link
  • Rolling element bearing fault diagnosis using convolutional neural network and vibration image (11 Jul 2021) Github Link Nine colab notebooks for machine learning and deep learning for predictive analysis in industry 4.0.
  • Github (29 Sep 2022) Github Link This project is about predictive maintenance with machine learning. It's a final project of my Computer Science AP degree. Supervised and unsupervised models for 3 tasks, 1. Anomaly detection, 2. Remaining useful life and 3. Failure prediction on 2 datasets a. CW Bearing and b. NASA Battery
  • Github *.R (3 Jan 2023) Github Link Failure Mode Classification from the NASA/IMS Bearing Dataset
  • Github (9 Feb 2022) Github Link Detection and multi-class classification of Bearing faults using Image classification from Case Western Reserve University data of bearing vibrations recorded at different frequencies. Developed an algorithm to convert vibrational data into Symmetrized Dot Pattern images based on a Research paper. Created an Image dataset of 50 different parameters and 4 different fault classes, to select optimum parameters for efficient classification. Trained and tested 50 different datasets on different Image-net models to obtain maximum accuracy. Obtained an accuracy of 98% for Binary classification of Inner and Outer race faults on Efficient Net B7 model on just 5 epochs.
  • Github (15 Dec 2022) Github Link A comprehensive, user-centric Python API for working with enDAQ data and devices Manual: Docs Link Video: endaq Link
  • Github (15 Nov 2020) Github Link Manual: Installation Link
  • Github (30 Sep 2021) TadGAN (w/LSTM) Github Link Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks" Paper Arxiv Link
  • (25 Jan 2023) Survey of Methods Github Link Anomaly detection in time series for space applications
  • (25 Jan 2022) Github Link Time series analysis using a grammar based compression algorithm. Uses the discretization used for time series in PySAX and the grammar based compression of Sequitur as basis for the compression of the time series. The algorithm then uses the compression to calculate a score of the compressibility of each point in the time-series. If the compressibility of a sequence of points is low for a certain sequence then an anomaly is detected.

VI. Model Architecture Types (some mixed)

A. RNN: LSTM/GRU

  • Visualization: LSTMViz (19 Nov 2021) Github Link
  • Paper: LSTM and GRU Neural Networks as Models of Dynamical Processes Used in Predictive Control: A Comparison of Models Developed for Two Chemical Reactors (17 Aug 2021) MDPI Link
  • Model: AE-RNN Github Link

B. VAE

C. GAN

1. Open Source: DoppleGANger

  • Paper (17 Jan 2021) Arxiv Link Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
  • Repo (12 Aug 2023) Github Link [IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

2. Commercial: Gretel.ai DGAN

3. Commercial: yData

4. WGAN Tutorial:

5. CGAN

6. Newer GAN Models

  • TimeGAN
    • (14 Oct 2022) Github Link Codebase for Time-series Generative Adversarial Networks (TimeGAN) - NeurIPS 2019
  • TimeSynth
    • (20201130) Github Link TimeSynth: A Multipurpose Library for Synthetic Time Series Generation in Python TimeSynth is an open source library for generating synthetic time series for model testing. The library can generate regular and irregular time series. The architecture allows the user to match different signals with different architectures allowing a vast array of signals to be generated.
  • LTSNet
    • (27 Apr 2022) Github Link A Tensorflow / Keras implementation of "Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks" paper
    • (21 Dec 2019) Github Link
  • RTSGAN
  • TSGAN for Cloud Workload
    • (16 Dec 2022) Github Link PyTorch implementation of A GAN-based method for time-dependent cloud workload generation.
  • TSGAN for Biology
    • (13 Dec 2019) Github Link Generation of Time Series data using generatuve adversarial networks (GANs) for biological purposes.

D. Transformers/Attention Heads

  • Paper: Transformers in Time Series: A Survey (11 May 2023) Arxiv Link

E. Energy

F. Diffusion

V. Multi-Model Frameworks / Benchmarks

  • Synthcity
    • Github (20230912 208 stars) Github Link A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.
  • TSGM (VAEs, GANS, Metrics)
    • Github (20230918 31 stars) Github Link Generative modeling of synthetic time series data and time series augmentations
    • Documentation (pypi) TSGM Docs Time Series Simulator (TSGM) Official Documentation
    • Colab: Colab Link Getting started with TSGM.ipynb
  • SB-PdM (Similarity-Based Predictive Maintenance) Feat Ext/Sim Metrics
    • Repo: Github Link SB-PdM-a-tool-for-predictive-maintenance-of-rolling-bearings-based-on-limited-labeled-data/SB_PdM_Tool.ipynb
    • Colab: Github Link SB-PdM-a-tool-for-predictive-maintenance-of-rolling-bearings-based-on-limited-labeled-data/SB_PdM_Tool.ipynb
  • Flow-Forecast
    • (12 Sep 2023) Github Link Flow Forecast (FF) is an open-source deep learning for time series forecasting framework. It provides all the latest state of the art models (transformers, attention models, GRUs, ODEs) and cutting edge concepts with easy to understand interpretability metrics, cloud provider integration, and model serving capabilities. Flow Forecast was the first time series framework to feature support for transformer based models and remains the only true end-to-end deep learning for time series framework.
    • Manual: Wiki Link
    • Tutorials: Github Link

VI. SOTA Research

  • C-GATS
    • Paper: Amazon Link c-gats-conditional-generation-of-anomalous-time-series.pdf
    • OpenReview.org (5 May 2023) OpenReview Link C-GATS: Conditional Generation of Anomalous Time Series
  • IH-TCGAN (1 May 2023)
    • Paper: MDPI Link Entropy | Free Full-Text | IH-TCGAN: Time-Series Conditional Generative Adversarial Network with Improved Hausdorff Distance for Synthesizing Intention Recognition Data
  • ImDiffusion
  • TransFusion

VII. Evaluation Metrics

1. Leaderboards

2. Testing Frameworks and Benchmarks

  • (24 Feb 2022) Github Link numenta/NAB: The Numenta Anomaly Benchmark Numenta Anomaly Benchmark (NAB) v1.1 is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is composed of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications.
  • (6 Aug 2023) Github Link DeepIntoStreams/Evaluation-of-Time-Series-Generative-Models Summarize the evaluation metrics used in unconditional generative models for synthetic data generation, list the advantages and disadvantages of each evaluation metric based on experiments on different datasets and models. We implement some popular models for time series generation including: Time-GAN, Recurrent Conditional GAN (RCGAN), Time-VAE.
  • TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series Arxiv Link
  • AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data (7 Jun 2023) Github Link [TKDD 2023] AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data

Here is the continued outline with URL links:

VIII. Future Directions

Graph Neural Networks

  • (25 Dec 2021) Arxiv Link [2010.05234] A Practical Tutorial on Graph Neural Networks
  • Arxiv Link
  • Hands-On Graph Neural Networks Using Python: Practical techniques and architectures for building powerful graph and deep learning apps with PyTorch: Labonne, Maxime: 9781804617526: Amazon.com: Books

Geometric Deep Learning

  • Towards Geometric Deep Learning (thegradient.pub)
  • [2306.11768] A Systematic Survey in Geometric Deep Learning for Structure-based Drug Design Arxiv Link
  • (13 Jul 2023) Frontiers | Geometric deep learning as a potential tool for antimicrobial peptide prediction Frontiers Link

Autonomous Agents

  • Multimodal AnamolyGPT (3 Sep 2023) Github Link CASIA-IVA-Lab/AnomalyGPT: The first LVLM based IAD method!
  • More Through Reasoning: GoT: Github Link spcl/graph-of-thoughts: Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"
  • (22 Aug 2023) A Survey on Large Language Model based Autonomous Agents

IX. Useful Texts

  • Generative Deep Learning, 2nd Ed. by David Foster (O’Reilly, June 2023)
  • Probabilistic Machine Learning: An Introduction by Kevin Murphy