Made by Xinyu Chen • 🌐 https://xinychen.github.io
Python codes for tensor factorization, tensor completion, and tensor regression techniques with the following real-world applications:
- geotensor | Image inpainting
- transdim | Spatiotemporal traffic data imputation and prediction
- Recommender systems
- mats | Multivariate time series imputation and forecasting
In a hurry? Please check out our contents as follows.
We conduct extensive experiments on some real-world data sets:
-
Middle-scale data sets:
- PeMS (P) registers traffic speed time series from 228 sensors over 44 days with 288 time points per day (i.e., 5-min frequency). The tensor size is 228 x 288 x 44.
- Guanghzou (G) contains traffic speed time series from 214 road segments in Guangzhou, China over 61 days with 144 time points per day (i.e., 10-min frequency). The tensor size is 214 x 144 x 61.
- Electricity (E) records hourly electricity consumption transactions of 370 clients from 2011 to 2014. We use a subset of the last five weeks of 321 clients in our experiments. The tensor size is 321 x 24 x 35.
-
Large-scale PeMS traffic speed data set registers traffic speed time series from 11160 sensors over 4/8/12 weeks (for PeMS-4W/PeMS-8W/PeMS-12W) with 288 time points per day (i.e., 5-min frequency) in California, USA. You can download this data set and place it at the folder of
../datasets
.- Data size:
- PeMS-4W: 11160 x 288 x 28 (contains about 90 million observations).
- PeMS-8W: 11160 x 288 x 56 (contains about 180 million observations).
- Data path example:
../datasets/California-data-set/pems-4w.csv
. - Open data in Python with
Pandas
:
- Data size:
import pandas as pd
data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
mats is a project in the tensor learning repository, and it aims to develop machine learning models for multivariate time series forecasting. In this project, we propose the following low-rank tensor learning models:
-
Low-Rank Autoregressive Tensor Completion (LATC) (3-min introduction) for multivariate time series (middle-scale data sets like PeMS, Guangzhou, and Electricity) imputation and forecasting (Chen et al., 2020):
- with nuclear norm (NN) minimization [Python code for imputation]
- with truncated nuclear norm (TNN) minimization [Python code for imputation] [Python code for prediction]
- with Schatten p-norm (SN) minimization [Python code for imputation]
- with truncated Schatten p-norm (TSN) minimization [Python code for imputation]
-
Low-Tubal-Rank Autoregressive Tensor Completion (LATC-Tubal) for large-scale spatiotemporal traffic data (large-scale data sets like PeMS-4W and PeMS-8W) imputation (Chen et al., 2020):
- without autoregressive norm [Python code]
- with autoregressive norm [Python code]
We write Python codes with Jupyter notebook and place the notebooks at the folder of
../mats
. If you want to test our Python code, please run the notebook at the folder of../mats
. Note that each notebook is independent on others, you could run each individual notebook directly.
The baseline models include:
-
on middle-scale data sets:
- coming soon...
-
on large-scale data sets:
-
Bayesian Probabilistic Matrix Factorization (BPMF, Salakhutdinov and Mnih, 2008) [Python code]
-
Bayesian Gaussian CP decomposition (BGCP, Chen et al., 2019) [Python code]
-
High-accuracy Low-Rank Tensor Completion (HaLRTC, Liu et al., 2013) [Python code]
-
Low-Rank Tensor Completion with Truncated Nuclear Norm minimization (LRTC-TNN, Chen et al., 2020) [Python code]
-
Tensor Nuclear Norm minimization with Discrete Cosine Transform (TNN-DCT, Lu et al., 2019) [Python code]
-
We write Python codes with Jupyter notebook and place the notebooks at the folder of
../baselines
. If you want to test our Python code, please run the notebook at the folder of../baselines
. The notebook which reproduces algorithm on large-scale data sets is emphasized byLarge-Scale-xx
.
We reproduce some tensor learning experiments in the previous literature.
Year | Title | Authors' Code | Our Code | Status | |
---|---|---|---|---|---|
2015 | Accelerated Online Low-Rank Tensor Learning for Multivariate Spatio-Temporal Streams | ICML 2015 | Matlab code | Python code | Under development |
2016 | Scalable and Sound Low-Rank Tensor Learning | AISTATS 2016 | - | xx | Under development |
We summarize some preliminaries for better understanding tensor learning. They are given in the form of tutorial as follows.
-
Foundations of Python Numpy Programming
- Generating random numbers in Matlab and Numpy [Jupyter notebook] [blog post]
-
Foundations of Tensor Computations
- Kronecker product
-
Singular Value Decomposition (SVD)
- Randomized singular value decomposition [Jupyter notebook] [blog post]
- Tensor singular value decomposition
If you find these codes useful, please star (★) this repository.
We believe that these material will be a valuable and useful source for the readers in the further study or advanced research.
-
Vladimir Britanak, Patrick C. Yip, K.R. Rao (2006). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press. [About the book]
-
Ruye Wang (2010). Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press. [PDF]
-
J. Nathan Kutz, Steven L. Brunton, Bingni Brunton, Joshua L. Proctor (2016). Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM. [About the book]
-
Yimin Wei, Weiyang Ding (2016). Theory and Computation of Tensors: Multi-Dimensional Arrays. Academic Press.
-
Steven L. Brunton, J. Nathan Kutz (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. [PDF] [data & code]
- If you want to run the code, please
- download (or clone) this repository,
- open the
.ipynb
file using Jupyter notebook, - and run the code.
This repository is from the following paper, please cite our paper if it helps your research.
This research is supported by the Institute for Data Valorization (IVADO).
This work is released under the MIT license.