Made by Xinyu Chen • 🌐 https://twitter.com/chenxy346
Python codes for tensor factorization, tensor completion, and tensor regression techniques with the following real-world applications:
- geotensor | Image inpainting
- transdim | Spatiotemporal traffic data imputation and prediction
- Recommender systems
- mats | Multivariate time series imputation and forecasting
In a hurry? Please check out our contents as follows.
We conduct extensive experiments on some real-world data sets:
-
Middle-scale data sets:
- PeMS (P) registers traffic speed time series from 228 sensors over 44 days with 288 time points per day (i.e., 5-min frequency). The tensor size is 228 x 288 x 44.
- Guanghzou (G) contains traffic speed time series from 214 road segments in Guangzhou, China over 61 days with 144 time points per day (i.e., 10-min frequency). The tensor size is 214 x 144 x 61.
- Electricity (E) records hourly electricity consumption transactions of 370 clients from 2011 to 2014. We use a subset of the last five weeks of 321 clients in our experiments. The tensor size is 321 x 24 x 35.
-
Large-scale PeMS traffic speed data set registers traffic speed time series from 11160 sensors over 4/8/12 weeks (for PeMS-4W/PeMS-8W/PeMS-12W) with 288 time points per day (i.e., 5-min frequency) in California, USA. You can download this data set and place it at the folder of
../datasets
.- Data size:
- PeMS-4W: 11160 x 288 x 28 (contains about 90 million observations).
- PeMS-8W: 11160 x 288 x 56 (contains about 180 million observations).
- Data path example:
../datasets/California-data-set/pems-4w.csv
. - Open data in Python with
Pandas
:
- Data size:
import pandas as pd
data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
mats is a project in the tensor learning repository, and it aims to develop machine learning models for multivariate time series forecasting. In this project, we propose the following low-rank tensor learning models:
-
Low-Rank Autoregressive Tensor Completion (LATC) (3-min introduction) for multivariate time series (middle-scale data sets like PeMS, Guangzhou, and Electricity) imputation and forecasting (Chen et al., 2020):
- with nuclear norm (NN) minimization [Python code for imputation]
- with truncated nuclear norm (TNN) minimization [Python code for imputation] [Python code for prediction]
- with Schatten p-norm (SN) minimization [Python code for imputation]
- with truncated Schatten p-norm (TSN) minimization [Python code for imputation]
-
Low-Tubal-Rank Autoregressive Tensor Completion (LATC-Tubal) for large-scale spatiotemporal traffic data (large-scale data sets like PeMS-4W and PeMS-8W) imputation (Chen et al., 2020):
- without autoregressive norm [Python code]
- with autoregressive norm [Python code]
We write Python codes with Jupyter notebook and place the notebooks at the folder of
../mats
. If you want to test our Python code, please run the notebook at the folder of../mats
. Note that each notebook is independent on others, you could run each individual notebook directly.
The baseline models include:
-
on middle-scale data sets:
- coming soon...
-
on large-scale data sets:
-
Bayesian Probabilistic Matrix Factorization (BPMF, Salakhutdinov and Mnih, 2008) [Python code]
-
Bayesian Gaussian CP decomposition (BGCP, Chen et al., 2019) [Python code]
-
High-accuracy Low-Rank Tensor Completion (HaLRTC, Liu et al., 2013) [Python code]
-
Low-Rank Tensor Completion with Truncated Nuclear Norm minimization (LRTC-TNN, Chen et al., 2020) [Python code]
-
Tensor Nuclear Norm minimization with Discrete Cosine Transform (TNN-DCT, Lu et al., 2019) [Python code]
-
We write Python codes with Jupyter notebook and place the notebooks at the folder of
../baselines
. If you want to test our Python code, please run the notebook at the folder of../baselines
. The notebook which reproduces algorithm on large-scale data sets is emphasized byLarge-Scale-xx
.
We reproduce some tensor learning experiments in the previous literature.
Year | Title | Authors' Code | Our Code | Status | |
---|---|---|---|---|---|
2015 | Accelerated Online Low-Rank Tensor Learning for Multivariate Spatio-Temporal Streams | ICML 2015 | Matlab code | Python code | Under development |
2016 | Scalable and Sound Low-Rank Tensor Learning | AISTATS 2016 | - | xx | Under development |
We summarize some preliminaries for better understanding tensor learning. They are given in the form of tutorial as follows.
-
Foundations of Python Numpy Programming
- Generating random numbers in Matlab and Numpy [Jupyter notebook] [blog post]
-
Foundations of Tensor Computations
- Kronecker product
-
Singular Value Decomposition (SVD)
- Randomized singular value decomposition [Jupyter notebook] [blog post]
- Tensor singular value decomposition
If you find these codes useful, please star (★) this repository.
We believe that these material will be a valuable and useful source for the readers in the further study or advanced research.
-
Vladimir Britanak, Patrick C. Yip, K.R. Rao (2006). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press. [About the book]
-
Ruye Wang (2010). Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press. [PDF]
-
J. Nathan Kutz, Steven L. Brunton, Bingni Brunton, Joshua L. Proctor (2016). Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM. [About the book]
-
Yimin Wei, Weiyang Ding (2016). Theory and Computation of Tensors: Multi-Dimensional Arrays. Academic Press.
-
Steven L. Brunton, J. Nathan Kutz (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. [PDF] [data & code]
- If you want to run the code, please
- download (or clone) this repository,
- open the
.ipynb
file using Jupyter notebook, - and run the code.
This repository is from the following paper, please cite our paper if it helps your research.
- Xinyu Chen, Lijun Sun (2020). Low-rank autoregressive tensor completion for multivariate time series forecasting. arXiv: 2006.10436. [preprint] [data & Python code]
This research is supported by the Institute for Data Valorization (IVADO).
This work is released under the MIT license.