/KLSE_LTSM_Autoencoder

This project uses historical data of Kuala Lumpur Stock Exchange to identify trade anomalies using LTSM

Primary LanguageJupyter Notebook

An LSTM (Long Short-Term Memory) Autoencoder is a type of neural network architecture used for unsupervised learning tasks, particularly in the context of sequence data like time series. An LSTM Autoencoder is a powerful tool for learning representations of sequential data and detecting anomalies in time series data. By leveraging the capabilities of LSTM networks to capture long-term dependencies and the reconstruction framework of autoencoders, it can effectively model complex temporal patterns and identify deviations from normal behavior.

  1. Autoencoder:

    • An autoencoder is a neural network designed to learn efficient representations of input data, typically by encoding the input into a lower-dimensional space and then reconstructing the original input from this encoding. It consists of an encoder and a decoder.
    • The encoder compresses the input data into a latent space representation, capturing the most important features of the input.
    • The decoder then reconstructs the original input data from the latent space representation, aiming to minimize the reconstruction error.
  2. LSTM (Long Short-Term Memory):

    • LSTM is a type of recurrent neural network (RNN) architecture that is well-suited for processing and predicting sequences of data, thanks to its ability to capture long-term dependencies and remember information over long time intervals.
    • Unlike traditional RNNs, which suffer from the vanishing gradient problem and have difficulty learning long-range dependencies, LSTM networks incorporate a memory cell and sophisticated gating mechanisms (input gate, forget gate, and output gate) to regulate the flow of information.
    • The memory cell enables LSTM networks to selectively remember or forget information from previous time steps, making them effective for modeling sequential data with long-range dependencies.
  3. LSTM Autoencoder:

    • An LSTM Autoencoder combines the principles of both LSTM networks and autoencoders to learn efficient representations of sequential data.
    • In the context of anomaly detection in time series data, an LSTM Autoencoder is trained to reconstruct normal patterns in the data. During training, it learns to encode the input time series into a latent space representation using an LSTM encoder. The decoder then attempts to reconstruct the original time series data from this latent representation. The model is trained to minimize the reconstruction error, which measures the discrepancy between the input and the reconstructed output.
  4. Anomaly Detection:

    • Once the LSTM Autoencoder is trained on normal data, it can be used for anomaly detection. During inference, the model is fed unseen data, and the reconstruction error for each data point is computed.
    • Anomalies are detected based on significant deviations between the original input and the reconstructed output. If the reconstruction error exceeds a predefined threshold, it indicates that the model is unable to accurately reconstruct the input, suggesting the presence of an anomaly in the data.
    • Anomalies are often rare and unexpected events or patterns that deviate from the norm. By identifying these anomalies, the LSTM Autoencoder can help detect unusual behavior or occurrences in the time series data, which may require further investigation or intervention.

Anomaly Detection in Time Series Data using LSTM Autoencoder

This repository contains code for detecting anomalies in time series data using an LSTM Autoencoder model. The code is written in Python and utilizes various libraries such as TensorFlow, pandas, NumPy, Matplotlib, seaborn, and scikit-learn.

Goal

The goal of this project is to develop an LSTM Autoencoder model to identify anomalies in time series data. Anomalies refer to data points or patterns that deviate significantly from the norm, indicating potential abnormalities or unusual events in the data.

Dependencies

The code relies on the following Python libraries:

  • numpy
  • tensorflow
  • pandas
  • seaborn
  • matplotlib
  • scikit-learn

These dependencies can be installed using the provided requirements.txt file.

Usage

  1. Data Preprocessing: The provided code assumes that the time series data is stored in a CSV file named KLSE.csv. You can replace this with your own dataset. The data is then preprocessed, including cleaning and standardization.

  2. LSTM Autoencoder Model: An LSTM Autoencoder model is designed and trained using TensorFlow's Keras API. This model is trained to reconstruct the input time series data and detect anomalies based on the reconstruction error.

  3. Anomaly Detection: After training the model, anomalies are detected by comparing the reconstruction error of the test data with a predefined threshold. Data points with reconstruction error exceeding the threshold are considered anomalies.

  4. Visualization: Finally, the detected anomalies are visualized alongside the original time series data using Matplotlib.

Running the Code

To run the code:

  1. Install the dependencies listed in the requirements.txt file.
  2. Replace the KLSE.csv file with your own time series data.
  3. Execute the provided Python script.

Example

The provided code demonstrates the entire process of anomaly detection in time series data using an LSTM Autoencoder model. It includes data preprocessing, model training, anomaly detection, and visualization.

License

This project is licensed

under the MIT License. Feel free to use, modify, and distribute the code for your own purposes. If you find it helpful, consider giving credit to the original authors.