transdim

Made by Xinyu Chen • 🌐 https://twitter.com/chenxy346

Machine learning models make important developments in the field of spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?

About this Project

In the transdim (transportation data imputation) project, we build machine learning models to help address some of the toughest challenges of spatiotemporal data modeling - from missing data imputation to time series prediction. The strategic aim of this project is creating accurate and efficient solutions for spatiotemporal traffic data imputation and prediction tasks.

In a hurry? Please check out our contents as follows.

Tasks and Challenges

Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.

Figure 1: Two classical missing patterns in a spatiotemporal setting.

Missing data imputation 🔥
- Random missing (RM): Each sensor lost their observations at completely random. (★★★)
- Non-random missing (NM): Each sensor lost their observations during several days. (★★★★)

Figure 2: Tensor completion framework for spatiotemporal missing traffic data imputation.

Spatiotemporal prediction 🔥
- Forecasting without missing values. (★★★)
- Forecasting with incomplete observations. (★★★★★)

Figure 3: Illustration of our proposed Low-Rank Tensor Completion (LATC) imputer/predictor with a prediction window τ (green nodes: observed values; white nodes: missing values; red nodes/panel: prediction; blue panel: training data to construct the tensor).

Implementation

Open data

In this repository, we have adapted the publicly available data sets into our experiments. If you want to view or use these data sets, please download them at the ../datasets/ folder in advance, and then run the following codes in your Python console:

import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

If you want to view the original data, please check out the following links:

Gdata: Guangzhou urban traffic speed data set.
Bdata: Birmingham parking data set.
Hdata: Hangzhou metro passenger flow data set.
Sdata: Seattle freeway traffic speed data set.
Ndata: NYC taxi data set.

In particular, we take into account large-scale traffic data imputation/prediction on PeMS-4W and PeMS-8W data sets:

PeMS-4W/8W/12W: Large-scale traffic speed data sets in California, USA.

You can download the data sets from Zenodo and place them at the folder of datasets (data path example: ../datasets/California-data-set/pems-4w.csv). Then you can open data in Python by using Pandas:

import pandas as pd

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)

For model evaluation, we mask certain entries of the "observed" data as missing values and then perform imputation for these "missing" values.

Model implementation

In our experiments, we have implemented some machine learning models mainly on Numpy, and written these Python codes with Jupyter Notebook. So, if you want to evaluate these models, please download and run these notebooks directly (prerequisite: download the data sets in advance).

Our models

Task	Jupyter Notebook	Gdata	Bdata	Hdata	Sdata	Ndata
Missing Data Imputation	BTMF	✅	✅	✅	✅	🔶
	BGCP	✅	✅	✅	✅	✅
	LRTC-TNN	✅	✅	✅	✅	🔶
	BTTF	🔶	🔶	🔶	🔶	✅
Single-Step Prediction	BTMF	✅	✅	✅	✅	🔶
	BTTF	🔶	🔶	🔶	🔶	✅
Multi-Step Prediction	BTMF	✅	✅	✅	✅	🔶
	BTTF	🔶	🔶	🔶	🔶	✅

Baselines

Task	Jupyter Notebook	Gdata	Bdata	Hdata	Sdata	Ndata
Missing Data Imputation	BayesTRMF	✅	✅	✅	✅	🔶
	TRMF	✅	✅	✅	✅	🔶
	BPMF	✅	✅	✅	✅	🔶
	HaLRTC	✅	✅	✅	✅	🔶
	TF-ALS	✅	✅	✅	✅	✅
	BayesTRTF	🔶	🔶	🔶	🔶	✅
	BPTF	🔶	🔶	🔶	🔶	✅
Single-Step Prediction	BayesTRMF	✅	✅	✅	✅	🔶
	TRMF	✅	✅	✅	✅	🔶
	BayesTRTF	🔶	🔶	🔶	🔶	✅
	TRTF	🔶	🔶	🔶	🔶	✅
Multi-Step Prediction	BayesTRMF	✅	✅	✅	✅	🔶
	TRMF	✅	✅	✅	✅	🔶
	BayesTRTF	🔶	🔶	🔶	🔶	✅
	TRTF	🔶	🔶	🔶	🔶	✅

✅ — Cover
🔶 — Does not cover
🚧 — Under development

Imputation/Prediction performance

Imputation example (on Gdata)

(a) Time series of actual and estimated speed within two weeks from August 1 to 14.

(b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Prediction example

Quick Start

This is an imputation example of Low-Rank Tensor Completion with Truncated Nuclear Norm minimization (LRTC-TNN). One notable thing is that unlike the complex equations in our paper, our Python implementation is extremely easy to work with.

First, import some necessary packages:

import numpy as np
from numpy.linalg import inv as inv

Define the operators of tensor unfolding (ten2mat) and matrix folding (mat2ten) using Numpy:

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

Define Singular Value Thresholding (SVT) for Truncated Nuclear Norm (TNN) minimization:

def svt_tnn(mat, alpha, rho, theta):
    tau = alpha / rho
    [m, n] = mat.shape
    if 2 * m < n:
        u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
        s = np.sqrt(s)
        idx = np.sum(s > tau)
        mid = np.zeros(idx)
        mid[:theta] = 1
        mid[theta:idx] = (s[theta:idx] - tau) / s[theta:idx]
        return (u[:,:idx] @ np.diag(mid)) @ (u[:,:idx].T @ mat)
    elif m > 2 * n:
        return svt_tnn(mat.T, tau, theta).T
    u, s, v = np.linalg.svd(mat, full_matrices = 0)
    idx = np.sum(s > tau)
    vec = s[:idx].copy()
    vec[theta:] = s[theta:] - tau
    return u[:,:idx] @ np.diag(vec) @ v[:idx,:]

Define performance metrics (i.e., RMSE, MAPE):

def compute_rmse(var, var_hat):
    return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

Define LRTC-TNN:

def LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter):
    """Low-Rank Tenor Completion with Truncated Nuclear Norm, LRTC-TNN."""
    
    dim = np.array(sparse_tensor.shape)
    pos_missing = np.where(sparse_tensor == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    
    X = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{X}}
    T = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{T}}
    Z = sparse_tensor.copy()
    last_tensor = sparse_tensor.copy()
    snorm = np.sqrt(np.sum(sparse_tensor ** 2))
    it = 0
    while True:
        rho = min(rho * 1.05, 1e5)
        for k in range(len(dim)):
            X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] / rho, k), alpha[k], rho, np.int(np.ceil(theta * dim[k]))), dim, k)
        Z[pos_missing] = np.mean(X + T / rho, axis = 0)[pos_missing]
        T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))
        tensor_hat = np.einsum('k, kmnt -> mnt', alpha, X)
        tol = np.sqrt(np.sum((tensor_hat - last_tensor) ** 2)) / snorm
        last_tensor = tensor_hat.copy()
        it += 1
        if (it + 1) % 50 == 0:
            print('Iter: {}'.format(it + 1))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
            print()
        if (tol < epsilon) or (it >= maxiter):
            break

    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()
    
    return tensor_hat

Let us try it on Guangzhou urban traffic speed data set (Gdata):

import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.2

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

Run the imputation experiment:

import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
theta = 0.30
epsilon = 1e-4
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

This example is from ../experiments/Imputation-LRTC-TNN.ipynb, you can check out this Jupyter Notebook for advanced usage.

Toy Examples

Time series forecasting
- Structured low-rank matrix completion
Time series imputation

References

Spatiotemporal forecasting
- Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski, 2019. Deep Factors for Forecasting. ICML 2019. (★★★★★)
- Danielle C. Maddix, Yuyang Wang, Alex Smola, 2018. Deep Factors with Gaussian Processes for Forecasting. arXiv.
- Syama Sundar Rangapuram, Matthias Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, Tim Januschowski, 2018. Deep State Space Models for Time Series Forecasting. NeurIPS 2018.
- Zheyi Pan, Yuxuan Liang, Junbo Zhang, Xiuwen Yi, Yong Yu, Yu Zheng, 2018. HyperST-Net: hypernetworks for spatio-temporal forecasting. arXiv.
- Truc Viet Le, Richard Oentaryo, Siyuan Liu, Hoong Chuin Lau, 2017. Local Gaussian processes for efficient fine-grained traffic speed prediction. arXiv.
- Yaguang Li, Cyrus Shahabi, 2018. A brief overview of machine learning methods for short-term traffic forecasting and future directions. ACM SIGSPATIAL, 10(1): 3-9.
- Bing Yu, Haoteng Yin, Zhanxing Zhu, 2017. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv. (appear in IJCAI 2018)
- Feras A. Saad, Vikash K. Mansinghka, 2018. Temporally-reweighted Chinese Restaurant Process mixtures for clustering, imputing, and forecasting multivariate time series. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. PMLR: Volume 84.
- Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu, 2018. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(6085).
- Zhengping Che, Sanjay Purushotham, Guangyu Li, Bo Jiang, Yan Liu, 2018. Hierarchical deep generative models for multi-rate multivariate time series. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR 80:784-793, 2018.
- Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla, 2018. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. arXiv.
- Wang, X., Chen, C., Min, Y., He, J., Yang, B., Zhang, Y., 2018. Efficient metropolitan traffic prediction based on graph recurrent neural network. arXiv.
- Peiguang Jing, Yuting Su, Xiao Jin, Chengqian Zhang, 2018. High-order temporal correlation model learning for time-series prediction. IEEE Transactions on Cybernetics, early access.
- Oren Anava, Elad Hazan, Assaf Zeevi, 2015. Online time series prediction with missing data. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 37: 2191-2199.
- Shanshan Feng, Gao Cong, Bo An, Yeow Meng Chee, 2017. POI2Vec: Geographical latent representation for predicting future visitors. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
- Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, Masatoshi Yoshikawa, 2012. Fast mining and forecasting of complex time-stamped events. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012).
- Yasuko Matsubara, Yasushi Sakurai, Willem G. van Panhuis, Christos Faloutsos, 2014. FUNNEL: automatic mining of spatially coevolving epidemics. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2014).
- Koh Takeuchi, Hisashi Kashima, Naonori Ueda, 2017. Autoregressive tensor factorization for spatio-temporal predictions. 2017 IEEE International Conference on Data Mining (ICDM 2017).
- Shun-Yao Shih, Fan-Keng Sun, Hung-yi Lee, 2018. Temporal pattern attention for multivariate time series forecasting. arXiv.
- Dingxiong Deng, Cyrus Shahabi, Ugur Demiryurek, Linhong Zhu, Rose Yu, Yan Liu, 2016. Latent space model for road networks to predict time-varying traffic. Proceedings of the 22rd ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2016).
Principal component analysis
- Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, Shin Ishii, 2003. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19: 2088-2096. [Matlab code]
- Li Qu, Li Li, Yi Zhang, Jianming Hu, 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems, 10(3): 512-522.
- Li Li, Yuebiao Li, Zhiheng Li, 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation Research Part C: Emerging Technologies, 34: 108-120.
Guassian process
- Michalis K. Titsias, Magnus Rattray, Neil D. Lawrence, 2009. Markov chain Monte Carlo algorithms for Gaussian processes, Chapter.
- Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira, 2018. Multi-output Gaussian processes for crowdsourced traffic data imputation. IEEE Transactions on Intelligent Transportation Systems, early access. [Matlab code]
- Nicolo Fusi, Rishit Sheth, Huseyn Melih Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv. [Python code]
- Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, Guillermo Sapiro, 2012. Kernelized probabilistic matrix factorization: exploiting graphs and side information. [slide]
- John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani, 2017. Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv.
- David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus, 2019. High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. arXiv. (★★★★)
Matrix factorization
- Nikhil Rao, Hsiangfu Yu, Pradeep Ravikumar, Inderjit S Dhillon, 2015. Collaborative filtering with graph information: Consistency and scalable methods. Neural Information Processing Systems (NIPS 2015). [Matlab code]
- Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. [Matlab code]
- Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch, 2018. Network-wide crowd flow prediction of Sydney trains via customized online non-negative matrix factorization. In The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy.
- Hanbaek Lyu, Georg Menz, Deanna Needell, and Christopher Strohmeier, 2020. Applications of Online Nonnegative Matrix Factorization to Image and Time-Series Data
- San Gultekin, John Paisley, 2019. Online Forecasting Matrix Factorization. IEEE Transactions on Signal Processing, 67(5): 1223-1236. [Python code]
Bayesian matrix and tensor factorization
- Ruslan Salakhutdinov, Andriy Mnih, 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland. [Matlab code (official)] [Python code] [Julia and C++ code] [Julia code]
- Neil D. Lawrence, Raquel Urtasun, 2009. Non-linear Matrix Factorization with Gaussian Processes. ICML 2009. (★★★★★)
- Ilya Sutskever, Ruslan Salakhutdinov, Joshua B. Tenenbaum, 2009. Modelling relational data using Bayesian clustered tensor factorization. NIPS 2009.
- kan Saha, Vikas Sindhwani, 2012. Learning evolving and emerging topics in social media: A dynamic NMF approach with temporal regularization. WSDM 2012. (★★★★)
- Nicolo Fusi, Rishit Sheth, Melih Huseyn Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv.
- Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211-222.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751-1763.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv.
- Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David B. Dunsun, Lawrence Carin, 2014. Scalable Bayesian low-rank decomposition of incomplete multiway tensors. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China.
- Ömer Deniz Akyildiz, Theodoros Damoulas, Mark F. J. Steel, 2019. Probabilistic sequential matrix factorization. arXiv. (★★★★★)
Matrix completion on graphs
- Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst, 2014. Matrix completion on graphs. arXiv. (appear in NIPS 2014)
- Rianne van den Berg, Thomas N. Kipf, Max Welling, 2018. Graph convolutional matrix completion. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), London, UK.
- Federico Monti, Michael M. Bronstein, Xavier Bresson, 2017. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. NIPS 2017.
- Tianyang Han, Kentaro Wada and Takashi Oguchi, 2019. Large-scale traffic data imputation using matrix completion on graphs. IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 2252-2258.
Low-rank tensor completion
- Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.
- Bin Ran, Huachun Tan, Yuankai Wu, Peter J. Jin, 2016. Tensor based missing traffic data completion with spatial–temporal correlation. Physica A: Statistical Mechanics and its Applications, 446: 54-63.
Generative Adversarial Nets
- Brandon Amos, 2016. Image completion with deep learning in TensorFlow. blog post. [github]
- Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. GAIN: missing data imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [supplementary materials] [Python code]
- Ian Goodfellow, 2016. NIPS 2016 tutorial: Generative Adversarial Networks.
- Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, Georg Langs, 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv.
- Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, Xiaojie Yuan, 2018. Multivariate time series imputation with generative adversarial networks. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
- Luo, Yonghong, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan, 2019. E 2 GAN: end-to-end generative adversarial network for multivariate time series imputation IJCAI 2019..
- Liu, Yukai, Rose Yu, Stephan Zheng, Eric Zhan, and Yisong Yue, 2019. NAOMI: Non-Autoregressive Multiresolution Sequence Imputation. NeurIPS 2019.
Variational Autoencoder
- Fortuin, Vincent, Gunnar Rätsch, and Stephan Mandt, 2019. GP-VAE: Deep Probabilistic Time Series Imputation. AISTATS 2020.
- Ivanov, Oleg, Michael Figurnov, and Dmitry Vetrov, 2019 Variational autoencoder with arbitrary conditioning. ICLR 2019.
- Boquet, Guillem, Antoni Morell, Javier Serrano, and Jose Lopez Vicario, 2020. A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection Transportation Research Part C: Emerging Technologies 115 (2020): 102622.
- Gregor, Karol, George Papamakarios, Frederic Besse, Lars Buesing, and Theophane Weber. Temporal difference variational auto-encoder ICLR 2019.
- Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, 2017. Factorized variational autoencoders for modeling audience reactions to movies. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.
- Graph autoencoder - GitHub.
- Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. WWW 2018.
- John T. McCoy, Steve Kroon, Lidia Auret, 2018. Variational Autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine, 51(21): 141-146. [Python code] [VAE demo]
- Pierre-Alexandre Mattei, Jes Frellsen, 2018. missingIWAE: Deep generative modelling and imputation of incomplete data. Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montréal, Canada. [related slide]
Tensor regression
- Guillaume Rabusseau, Hachem Kadri, 2016. Low-rank regression with tensor responses. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
- Rose Yu, Yan Liu, 2016. Learning from multiway data: simple and efficient tensor regression. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Masaaki Imaizumi, Kohei Hayashi, 2016. Doubly decomposing nonparametric tensor regression. Proceedings of the 33 rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Rose Yu, Guangyu Li, Yan Liu, 2018. Tensor regression meets Gaussian processes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. [Matlab code]
- Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang, 2018. Boosted sparse and low-rank tensor regression. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
Poisson matrix factorization
- Liangjie Hong, 2015. Poisson matrix factorization. blog post.
- Ali Taylan Cemgil, 2009. Bayesian inference for nonnegative matrix factorisation models. Computational intelligence and neuroscience.
- Prem Gopalan, Jake M. Hofman, David M. Blei, 2015. Scalable recommendation with hierarchical poisson factorization. In UAI, 326-335. [C++ code]
- Laurent Charlin, Rajesh Ranganath, James Mclnerney, 2015. Dynamic Poisson factorization. Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Italy. [C++ code]
- Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee, 2017. Recurrent Poisson factorization for temporal recommendation. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, Nova Scotia Canada. [Matlab code]
- Aaron Schein, Scott W. Linderman, Mingyuan Zhou, David M. Blei, Hanna Wallach, 2019. Poisson-Randomized Gamma Dynamical Systems. arXiv. (★★★★★)
Graph signal processing
- Arman Hasanzadeh, Xi Liu, Nick Duffield, Krishna R. Narayanan, Byron Chigoy, 2017. A graph signal processing approach for real-time traffic prediction in transportation networks. arXiv.
- Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, Pierre Vandergheynst, 2018. Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE, 106(5): 808-828. [slide]
Graph neural network
- How to do Deep Learning on Graphs with Graph Convolutional Networks (Part 1: A High-Level Introduction to Graph Convolutional Networks). blog post.
- Structured deep models: Deep learning on graphs and beyond. slide.
- gcn: Implementation of Graph Convolutional Networks in TensorFlow. GitHub project.
- gated-graph-neural-network-samples: Sample Code for Gated Graph Neural Networks. GitHub project.
- Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, Yan Liu, 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. AAAI 2019.
- Menglin Wang, Baisheng Lai, Zhongming Jin, Yufeng Lin, Xiaojia Gong, Jiangqiang Huang, Xiansheng Hua, 2018. Dynamic spatio-temporal graph-based CNNs for traffic prediction. arXiv.
Missing data imputation
- Daniel J. Stekhoven, Peter Bühlmann, 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1): 112–118. [missingpy - PyPI] or [missingpy - GitHub]
- fancyimpute: A variety of matrix completion and imputation algorithms implemented in Python. [homepage]
- Dimitris Bertsimas, Colin Pawlowski, Ying Daisy Zhuo, 2018. From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research, 18(196): 1-39.
- Wei Cao, Dong Wang, Jian Li, Hao Zhou, Yitan Li, Lei Li, 2018. BRITS: Bidirectional Recurrent Imputation for Time Series. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]

Our Publications

Xinyu Chen, Lijun Sun (2020). Low-rank autoregressive tensor completion for multivariate time series forecasting. arXiv: 2006.10436. [preprint] [data & Python code]
Xinyu Chen, Jinming Yang, Lijun Sun (2020). A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 117: 102673. [preprint] [doi] [data & Python code]
Xinyu Chen, Lijun Sun (2019). Bayesian temporal factorization for multidimensional time series prediction. arXiv: 1910.06366. [preprint] [slide] [data & Python code]
Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [doi] [slide] [data] [Matlab code]
Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]
Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

This project is from the above papers, please cite these papers if they help your research.

Collaborators

_{Xinyu Chen}
💻

_{Jinming Yang}
💻

_{Yixian Chen}
💻

_{Lijun Sun}
💻

_{Tianyang Han}
💻

Principal Investigator (PI)

_{Lijun Sun}
💻

See the list of contributors who participated in this project.

Our transdim is still under development. More machine learning models and technical features are going to be added and we always welcome contributions to help make transdim better. If you have any suggestion about this project or want to collaborate with us, please feel free to contact Xinyu Chen (email: chenxy346@gmail.com) and send your suggestion/statement. We would like to thank everyone who has helped this project in any way.

Recommended email subjects:

Suggestion on transdim from [+ your name]

Collaboration statement on transdim from [+ your name]

Acknowledgements

This research is supported by the Institute for Data Valorization (IVADO).

License

This work is released under the MIT license.

M201773741/transdim

transdim

Made by Xinyu Chen • 🌐 https://twitter.com/chenxy346

About this Project

Tasks and Challenges

Figure 1: Two classical missing patterns in a spatiotemporal setting.

Figure 2: Tensor completion framework for spatiotemporal missing traffic data imputation.

Figure 3: Illustration of our proposed Low-Rank Tensor Completion (LATC) imputer/predictor with a prediction window τ (green nodes: observed values; white nodes: missing values; red nodes/panel: prediction; blue panel: training data to construct the tensor).

Implementation

Open data

Model implementation

Imputation/Prediction performance

Quick Start

Toy Examples

References

Spatiotemporal forecasting

Principal component analysis

Guassian process

Matrix factorization

Bayesian matrix and tensor factorization

Matrix completion on graphs

Low-rank tensor completion

Generative Adversarial Nets

Variational Autoencoder

Tensor regression

Poisson matrix factorization

Graph signal processing

Graph neural network

Missing data imputation

Our Publications

Collaborators

Acknowledgements

License