Unsuperised Time Series Imputation

❓ Context

A common challenge encountered during working with time series is the presence of missing data. To address this issue, imputation is a widely used approach that involves filling in missing values rather than dropping them. However, the key challenge in imputation is determining the appropriate values to use for filling in the missing data.

🎯 Objective

In this project, we propose to assess the effectiveness of applying deep learning-based models in time series imputation compared to statistical methods that do not require prior training.

🚀 How to use the project

First, you need to clone the repository and cd into it :

git clone https://github.com/lidamsoukaina/Unsupervised_Time_Series_Imputation.git
cd Unsupervised_Time_Series_Imputation

Then, you need to create a virtual environment and activate it :

python3 -m venv venv
source venv/bin/activate

You need to install all the requirements using the following command :

pip install -r requirements.txt

[Optional] if you are using this repository in development mode, you can run the following command to set up the git hook scripts:

pre-commit install

You need to create folders data and trained_models and some subfolders:

mkdir trained_models
mkdir data trained_models/AE trained_models/convAE trained_models/LSTM_AE trained_models/TS

Add the csv file 'household_power_consumption.csv' to data folder (link to the csv https://drive.google.com/drive/folders/10OYuhaT3nEaJmoGJLNMzOiSVPCtMJJtW?usp=sharing)

Remark: If you want to use your own dataset, you need to add it to the data folder as 3 csv file (train, val and test) and edit the config.yaml file. Eding the config.yaml file is necessary to specify the path to the csv files and the name of the unnecessary columns (if none: columns_to_drop: []) .

Now you can run the main.ipynb notebook.

Remark:

If you are using you own dataset, run the test.ipynb notebook.
You can change the hyperparameters of each model in the config files contained in the training folder.

🎉 Implemented models

As baseline, we implemented various statistical based models : Linear Interpolation, MICE, NOCB, LOCF, Spline Interpolation, Median and Mode.

To assess the effectiveness of using deep learning models for the task of unsupervised time series imputation, we implemented four different architectures in a try to cover the main types of neural network:

Autoencoder
Convolutional Autoencoder
LSTM Autoencoder
Transformer Encoder

The architecture of the models is described in the models folder.

✏️ Authors

LETAIEF Maram
LIDAM Soukaina