/Tensor4ML

Tensor decomposition for machine learning (w/ Python implementation)

Primary LanguageJupyter NotebookMIT LicenseMIT

Tensor Decomposition

MIT License Python 3.7 GitHub stars

Made by Xinyu Chen • 🌐 https://xinychen.github.io

Python codes for tensor factorization, tensor completion, and tensor regression techniques with the following real-world applications:

  • geotensor | Image inpainting
  • transdim | Spatiotemporal traffic data imputation and prediction
  • Recommender systems
  • mats | Multivariate time series imputation and forecasting

In a hurry? Please check out our contents as follows.

Our Research

▴ Back to top

We conduct extensive experiments on some real-world data sets:

  • Middle-scale data sets:

    • PeMS (P) registers traffic speed time series from 228 sensors over 44 days with 288 time points per day (i.e., 5-min frequency). The tensor size is 228 x 288 x 44.
    • Guanghzou (G) contains traffic speed time series from 214 road segments in Guangzhou, China over 61 days with 144 time points per day (i.e., 10-min frequency). The tensor size is 214 x 144 x 61.
    • Electricity (E) records hourly electricity consumption transactions of 370 clients from 2011 to 2014. We use a subset of the last five weeks of 321 clients in our experiments. The tensor size is 321 x 24 x 35.
  • Large-scale PeMS traffic speed data set registers traffic speed time series from 11160 sensors over 4/8/12 weeks (for PeMS-4W/PeMS-8W/PeMS-12W) with 288 time points per day (i.e., 5-min frequency) in California, USA. You can download this data set and place it at the folder of ../datasets.

    • Data size:
      • PeMS-4W: 11160 x 288 x 28 (contains about 90 million observations).
      • PeMS-8W: 11160 x 288 x 56 (contains about 180 million observations).
    • Data path example: ../datasets/California-data-set/pems-4w.csv.
    • Open data in Python with Pandas:
import pandas as pd

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)

mats

mats is a project in the tensor learning repository, and it aims to develop machine learning models for multivariate time series forecasting. In this project, we propose the following low-rank tensor learning models:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../mats. If you want to test our Python code, please run the notebook at the folder of ../mats. Note that each notebook is independent on others, you could run each individual notebook directly.

The baseline models include:

We write Python codes with Jupyter notebook and place the notebooks at the folder of ../baselines. If you want to test our Python code, please run the notebook at the folder of ../baselines. The notebook which reproduces algorithm on large-scale data sets is emphasized by Large-Scale-xx.

📖 Reproducing Literature in Python

▴ Back to top

We reproduce some tensor learning experiments in the previous literature.

Year Title PDF Authors' Code Our Code Status
2015 Accelerated Online Low-Rank Tensor Learning for Multivariate Spatio-Temporal Streams ICML 2015 Matlab code Python code Under development
2016 Scalable and Sound Low-Rank Tensor Learning AISTATS 2016 - xx Under development

📖 Tutorial

▴ Back to top

We summarize some preliminaries for better understanding tensor learning. They are given in the form of tutorial as follows.

  • Foundations of Python Numpy Programming

  • Foundations of Tensor Computations

    • Kronecker product
  • Singular Value Decomposition (SVD)

If you find these codes useful, please star (★) this repository.

Helpful Material

▴ Back to top

We believe that these material will be a valuable and useful source for the readers in the further study or advanced research.

  • Vladimir Britanak, Patrick C. Yip, K.R. Rao (2006). Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Academic Press. [About the book]

  • Ruye Wang (2010). Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press. [PDF]

  • J. Nathan Kutz, Steven L. Brunton, Bingni Brunton, Joshua L. Proctor (2016). Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM. [About the book]

  • Yimin Wei, Weiyang Ding (2016). Theory and Computation of Tensors: Multi-Dimensional Arrays. Academic Press.

  • Steven L. Brunton, J. Nathan Kutz (2019). Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press. [PDF] [data & code]

Quick Run

▴ Back to top

  • If you want to run the code, please
    • download (or clone) this repository,
    • open the .ipynb file using Jupyter notebook,
    • and run the code.

Citing

▴ Back to top

This repository is from the following paper, please cite our paper if it helps your research.

Acknowledgements

▴ Back to top

This research is supported by the Institute for Data Valorization (IVADO).

License

▴ Back to top

This work is released under the MIT license.