/transdim

Data imputation for urban transportation system

Primary LanguageJupyter NotebookMIT LicenseMIT

transdim

Transportation data imputation (transdim).

Strategic Aim

Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.

Tasks and Challenges

  • Missing data imputation

    • Random missing: each sensor lost their observations at completely random. (simple task)

    • Fiber missing: each sensor lost their observations during several days. (difficult task)

  • Rolling traffic prediction (short-term/long-term)

    • Forecasting without missing values. (simple task)

    • Forecasting with incomplete observations. (difficult task)

Do what just now!

  • add a framework indicating overall studies;

  • define the problems clearly;

    • Example: Traffic forecasting using matrix factorization models.

      example

Real experiment setting: Observations with 0%, 20% and 40% fiber missing rates during first 56 days are treated as stationary inputs. Meanwhile, there are some rolling inputs for forecasting traffic speed during last 5 days (from Monday to Friday) in a rolling manner.

  • describe the core challenges intuitively;
  • list main contributions of these studies.

What we care about!

  • Best algebraic structure for data imputation.
  • The context of urban transportation (e.g., biases).
  • Data noise avoidance.
  • Competitive imputation and prediction performance.
  • Capable of various missing data scenarios.

Overview

With the development and application of intelligent transportation systems, large quantities of urban traffic data are collected on a continuous basis from various sources, such as loop detectors, cameras, and floating vehicles. These data sets capture the underlying states and dynamics of transportation networks and the whole system and become beneficial to many traffic operation and management applications, including routing, signal control, travel time prediction, and so on. However, the missing data problem is inevitable when collecting traffic data from intelligent transportation systems.

Publicly available at our Zenodo repository!

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

Figure 1: The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

Machine learning models

Selected references

Publications

Please consider citing our papers if they help your research.

Our blog posts (in Chinese)

License

This work is released under the MIT license.