SAITS: A Python repository from Niharikajo

The official code repository for paper SAITS: Self-Attention-based Imputation for Time Series.

⦿ Motivation: SAITS is developed primarily to help overcome the drawbacks (slow speed, memory constraints, and compounding error) of RNN-based imputation models and to obtain the state-of-the-art (SOTA) imputation accuracy on partially-observed time series.

⦿ Performance: SAITS outperforms BRITS by 12% ∼ 38% in MAE (mean absolute error) and achieves 2.0 ∼ 2.6 times faster training speed. Furthermore, SAITS outperforms Transformer (trained by our joint-optimization approach) by 2% ∼ 13% in MAE with a more efficient model structure (to obtain comparable performance, SAITS needs only 15% ∼ 30% parameters of Transformer). Compared to another SOTA self-attention imputation model NRTSI, SAITS achieves 7% ∼ 39% smaller mean squared error (above 20% in nine out of sixteen cases), meanwhile, needs much fewer parameters and less imputation time in practice. Please refer to our full paper for more details about SAITS' performance.

❖ Repository Structure

The implementation of SAITS is in dir modeling. We give configurations of our models in dir configs, provide the dataset links and preprocessing scripts in dir dataset_generating_scripts. Dir NNI_tuning contains the hyper-parameter searching configurations.

❖ Implemented Models

The implemented models in dir modeling are listed below:

MRNN (in modeling/mrnn.py)
BRITS (in modeling/brits.py)
Transformer (in modeling/SA_models.py#L28)
SAITS (in modeling/SA_models.py#L93)

For other baseline models used in the paper, please refer to their GitHub open-source repositories given in their original papers (the links also available in our paper).

❖ Development Environment

All dependencies of our development environment are listed in file conda_env_dependencies.yml. You can quickly create a usable python environment with an anaconda command conda env create -f conda_env_dependencies.yml. ❗️Note that this file is for Linux platform, but you still can use it for reference of dependency libraries.

❖ Datasets

For datasets downloading and generating, please check out the scripts in dir dataset_generating_scripts.

❖ Quick Run

Generate the dataset you need first. To do so, please check out the generating scripts in dir dataset_generating_scripts.

After data generation, train and test your model, for example,

# for training
CUDA_VISIBLE_DEVICES=2 nohup python run_models.py \
    --config_path configs/PhysioNet2012_SAITS_best.ini \
    > NIPS_results/PhysioNet2012_SAITS_best.out &

# for testing
CUDA_VISIBLE_DEVICES=3 python run_models.py \
    --config_path configs/PhysioNet2012_SAITS_best.ini \
    --test_mode

❗️Note that paths of datasets and saving dirs may be different on personal computers, please check them in the configuration files.

❖ Reference

If you use this model or the code in this repository, please cite our paper 🤗

@article{Du2022SAITS,
      title={{SAITS: Self-Attention-based Imputation for Time Series}}, 
      author={Wenjie Du and David Côté and Yan Liu},
      year={2022},
      eprint={2202.08516},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Wenjie Du, David Côté, and Yan Liu. "SAITS: Self-Attention-based Imputation for Time Series." ArXiv abs/2202.08516

❖ Acknowledgments

Thanks to Mitacs and NSERC (Natural Sciences and Engineering Research Council of Canada) for funding support. Thanks to Ciena for providing computing resources. Thanks to all reviewers for helping improve the quality of this paper. And thank you all for your attention to this work!

👏 Click to View Stargazers and Forkers:

✨Stars, forks, issues, and PRs are all welcome! If you have any other questions, please drop me an email at any time.

Steps to be followed to run the code in google colab :

Download the code and upload it to your google drive
Install any missing packages
change the directory to the SAITS code folder in drive ex: %cd /content/drive/MyDrive/SAITS-master
Data prepossessing i) change directory to dataset_generating_scripts ii) Run script !bash data_downloading.sh to download the dataset iii) Run script !bash dataset_generating.sh to generate the dataset (missingness is induced and dataset is converted to .h5 format). This file is useed for further training purposes.
In cofigs folder select a model file and change the paths and directories . Example in AirQuality_SAITS_best.ini file change the file paths and dir (In the following line nos. 3, 5,6,12,15).
Training script !CUDA_VISIBLE_DEVICES=0 python run_models.py --config_path configs/AirQuality_SAITS_best.ini
Results are saved in NIPS_results folder
Before testing the model update the file paths in AirQuality_SAITS_best.ini for testing (line no. 10,80).
Testing script !CUDA_VISIBLE_DEVICES=0 python run_models.py --config_path configs/AirQuality_SAITS_best.ini --test_mode
Test results are stored in NIPS_results/AirQuality_SAITS_best/step_1015

Niharikajo/SAITS