/transdim

Machine learning for transportation data imputation and prediction.

Primary LanguageJupyter NotebookMIT LicenseMIT

transdim

MIT License Python 3.7 repo size GitHub stars

logo

Machine learning models make important developments about spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?

About the Project

In the transdim (transportation data imputation) project, we build machine learning models to help address some of the toughest challenges of spatiotemporal data modeling - from missing data imputation to time series prediction. The strategic aim of this project is creating accurate and efficient solutions for spatiotemporal traffic data imputation and prediction tasks.

In a hurry? Please check out our contents as follows.

Tasks and Challenges

Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.

  • Missing data imputation πŸ”₯

    • Random missing (RM): Each sensor lost their observations at completely random. (β˜…β˜…β˜…)
    • Non-random missing (NM): Each sensor lost their observations during several days. (β˜…β˜…β˜…β˜…)

drawing

Example: Tensor completion framework for multi-dimensional missing traffic data imputation.

  • Spatiotemporal prediction πŸ”₯
    • Forecasting without missing values. (β˜…β˜…β˜…)
    • Forecasting with incomplete observations. (β˜…β˜…β˜…β˜…β˜…)

drawing

Example: An illustration of single-step rolling prediction task under a matrix factorization framework.

Implementation

Open data

In this repository, we have adapted the public data sets into our experiments. For example, to read the data set on your console, you may see the following code:

import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

If you want to view the original data, please check out the following links:

Model implementation

In our experiments, we have implemented the machine learning models mainly on Numpy, and written these Python codes with Jupyter Notebook. So, if you want to evaluate these models, you could download and run these notebooks directly (prerequisite: download the data sets before evaluation).

Task Jupyter Notebook link Gdata Bdata Hdata Sdata Ndata
Missing Data Imputation BTMF βœ… βœ… βœ… βœ… πŸ”Ά
BayesTRMF βœ… βœ… βœ… βœ… πŸ”Ά
TRMF βœ… βœ… βœ… βœ… πŸ”Ά
BPMF βœ… βœ… βœ… βœ… πŸ”Ά
BGCP βœ… βœ… βœ… βœ… βœ…
TF-ALS βœ… βœ… βœ… βœ… βœ…
BTTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
BayesTRTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
BPTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
Single-Step Prediction BTMF βœ… βœ… βœ… βœ… πŸ”Ά
BayesTRMF βœ… βœ… βœ… βœ… πŸ”Ά
TRMF βœ… βœ… βœ… βœ… πŸ”Ά
BTTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
BayesTRTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
TRTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
Multi-Step Prediction BTMF βœ… βœ… βœ… βœ… πŸ”Ά
BayesTRMF βœ… βœ… βœ… βœ… πŸ”Ά
TRMF βœ… βœ… βœ… βœ… πŸ”Ά
BTTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
BayesTRTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
TRTF πŸ”Ά πŸ”Ά πŸ”Ά πŸ”Ά βœ…
  • βœ… β€” Covered
  • πŸ”Ά β€” Does not cover
  • 🚧 β€” Under development

If you have any suggestion, please feel free to contact Xinyu Chen (email: chenxy346@mail2.sysu.edu.cn) and send your suggestions.

Recommended email subject: Suggestions on transdim from [+ your name].

Imputation/Prediction performance

  • Imputation example

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate Ξ±=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

  • Prediction example

example

example

example

References

Our Publications

  • Xinyu Chen, Lijun Sun (2019). Bayesian temporal factorization for multidimensional time series prediction. arxiv. 1910.06366. [preprint] [slide] [data & Python code]

  • Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [doi] [slide] [data] [Matlab code]

  • Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]

  • Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

    This project originates from our papers, please consider citing our papers if they help your research.

Collaborators

Xinyu Chen
Xinyu Chen

πŸ’»
Jinming Yang
Jinming Yang

πŸ’»
Yixian Chen
Yixian Chen

πŸ’»
Lijun Sun
Lijun Sun

πŸ’»
Tianyang Han
Tianyang Han

πŸ’»

See the list of contributors who participated in this project.

License

This work is released under the MIT license.