transdim

Transportation data imputation (transdim).

Strategic Aim

Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.

Tasks and Challenges

Missing data imputation
- Random missing: each sensor lost their observations at completely random. (simple task)
- Fiber missing: each sensor lost their observations during several days. (difficult task)
Rolling traffic prediction (short-term/long-term)
- Forecasting without missing values. (simple task)
- Forecasting with incomplete observations. (difficult task)

Do what just now!

add a framework indicating overall studies;
define the problems clearly;
- Example: Traffic forecasting using matrix factorization models.

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

describe the core challenges intuitively;
list main contributions of these studies.

What we care about！

Best algebraic structure for data imputation.
The context of urban transportation (e.g., biases).
Data noise avoidance.
Competitive imputation and prediction performance.
Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Urban traffic speed data set of Guangzhou, China

Publicly available at our Zenodo repository!

(a) Time series of actual and estimated speed within two weeks from August 1 to 14.

(b) Time series of actual and estimated speed within two weeks from September 12 to 25.

Figure 1: The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

LocInt: local interpolation.
- This model considers local information from observations at the neighboring time slots of the missing values.
TRMF: Temporal regularized matrix factorization. [Matlab code is also available!]
- Alleviating hyperparameters setting is a rewarding way.
BGCP: Bayesian Gaussian CP decomposition. [Imputation example - Notebook] [Matlab code is also available!]
BPMF: Bayesian probabilistic matrix factorization.
HaLRTC: High accuracy low rank tensor completion.
GAIN: Generative Adversarial Imputation Nets. [Python code is also available!]

Selected references

Spatio-temporal forecasting
- Zheyi Pan, Yuxuan Liang, Junbo Zhang, Xiuwen Yi, Yong Yu, Yu Zheng, 2018. HyperST-Net: hypernetworks for spatio-temporal forecasting. arXiv.
- Truc Viet Le, Richard Oentaryo, Siyuan Liu, Hoong Chuin Lau, 2017. Local Gaussian processes for efficient fine-grained traffic speed prediction. arXiv.
- Yaguang Li, Cyrus Shahabi, 2018. A brief overview of machine learning methods for short-term traffic forecasting and future directions. ACM SIGSPATIAL, 10(1): 3-9.
- Bing Yu, Haoteng Yin, Zhanxing Zhu, 2017. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv. (appear in IJCAI 2018)
- Feras A. Saad, Vikash K. Mansinghka, 2018. Temporally-reweighted Chinese Restaurant Process mixtures for clustering, imputing, and forecasting multivariate time series. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. PMLR: Volume 84.
- Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu, 2018. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(6085).
- Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla, 2018. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. arXiv.
- Wang, X., Chen, C., Min, Y., He, J., Yang, B., & Zhang, Y. (2018). Efficient Metropolitan Traffic Prediction Based on Graph Recurrent Neural Network.. arXiv preprint arXiv:1811.00740.
Principal component analysis (PCA)
- Li Qu, Li Li, Yi Zhang, Jianming Hu, 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems, 10(3): 512-522.
- Li Li, Yuebiao Li, Zhiheng Li, 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation Research Part C: Emerging Technologies, 34: 108-120.
Guassian process (GP)
- Michalis K. Titsias, Magnus Rattray, Neil D. Lawrence, 2009. Markov chain Monte Carlo algorithms for Gaussian processes, Chapter.
- Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira, 2018. Multi-output Gaussian processes for crowdsourced traffic data imputation. IEEE Transactions on Intelligent Transportation Systems, early access. [Matlab code]
- Nicolo Fusi, Rishit Sheth, Huseyn Melih Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv. [Python code]
- Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, Guillermo Sapiro, 2012. Kernelized probabilistic matrix factorization: exploiting graphs and side information. [slide]
Matrix factorization
- Nikhil Rao, Hsiangfu Yu, Pradeep Ravikumar, Inderjit S Dhillon, 2015. Collaborative filtering with graph information: Consistency and scalable methods. Neural Information Processing Systems (NIPS 2015). [Matlab code]
- Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. [Matlab code]
- Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch, 2018. Network-wide crowd flow prediction of Sydney trains via customized online non-negative matrix factorization. In The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy.
Bayesian matrix/tensor factorization
- Ruslan Salakhutdinov, Andriy Mnih, 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland.
- Ilya Sutskever, Ruslan Salakhutdinov, Joshua B. Tenenbaum, 2009. Modelling relational data using Bayesian clustered tensor factorization. NIPS 2009.
- Nicolo Fusi, Rishit Sheth, Melih Huseyn Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv.
- Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211-222.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751-1763.
- Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv.
- Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David B. Dunsun, Lawrence Carin, 2014. Scalable Bayesian low-rank decomposition of incomplete multiway tensors. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China.
Low-rank tensor completion
- Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.
- Bin Ran, Huachun Tan, Yuankai Wu, Peter J. Jin, 2016. Tensor based missing traffic data completion with spatial–temporal correlation. Physica A: Statistical Mechanics and its Applications, 446: 54-63.
Generative Adversarial Nets (GAN)
- Brandon Amos, 2016. Image completion with deep learning in TensorFlow. blog post. [github]
- Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. GAIN: missing data imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [supplementary materials] [Python code]
- Ian Goodfellow, 2016. NIPS 2016 tutorial: Generative Adversarial Networks.
- Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, Georg Langs, 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv.
Variational Autoencoder (VAE)
- Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, 2017. Factorized variational autoencoders for modeling audience reactions to movies. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.
- Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst, 2014. Matrix completion on graphs. arXiv. (appear in NIPS 2014)
- Rianne van den Berg, Thomas N. Kipf, Max Welling, 2018. Graph convolutional matrix completion. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), London, UK.
- Graph autoencoder - GitHub.
- Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. WWW 2018.
Tensor regression
- Guillaume Rabusseau, Hachem Kadri, 2016. Low-rank regression with tensor responses. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
- Rose Yu, Yan Liu, 2016. Learning from multiway data: simple and efficient tensor regression. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Masaaki Imaizumi, Kohei Hayashi, 2016. Doubly decomposing nonparametric tensor regression. Proceedings of the 33 rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
- Rose Yu, Guangyu Li, Yan Liu, 2018. Tensor regression meets Gaussian processes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain.
Poisson matrix factorization
- Liangjie Hong, 2015. Poisson matrix factorization. blog post.
- Ali Taylan Cemgil, 2009. Bayesian inference for nonnegative matrix factorisation models. Computational intelligence and neuroscience.
- Prem Gopalan, Jake M. Hofman, David M. Blei, 2015. Scalable recommendation with hierarchical poisson factorization. In UAI, 326-335. [C++ code]
- Laurent Charlin, Rajesh Ranganath, James Mclnerney, 2015. Dynamic Poisson factorization. Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Italy. [C++ code]
- Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee, 2017. Recurrent Poisson factorization for temporal recommendation. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, Nova Scotia Canada. [Matlab code]
Graph signal processing (GSP)
- Arman Hasanzadeh, Xi Liu, Nick Duffield, Krishna R. Narayanan, Byron Chigoy, 2017. A graph signal processing approach for real-time traffic prediction in transportation networks. arXiv.
- Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, Pierre Vandergheynst, 2018. Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE, 106(5): 808-828. [slide]

Publications

Xinyu Chen, Zhaocheng He, Jiawei Wang, 2018. Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77.
Xinyu Chen, Zhaocheng He, Lijun Sun, 2018. A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [Matlab code]

Please consider citing our papers if they help your research.

Our blog posts (in Chinese)

贝叶斯泊松分解变分推断笔记, by Yixian Chen (陈一贤).
变分贝叶斯推断笔记, by Yixian Chen (陈一贤).
贝叶斯高斯张量分解, by Xinyu Chen (陈新宇).
贝叶斯矩阵分解, by Xinyu Chen (陈新宇).

License

This work is released under the MIT license.

Vadermit/transdim