A graph based strategic transport planning dataset, aimed at creating the next generation of deep graph neural networks in transfer learning situations. Based on simulation results of the Four Step Model in PTV Visum.
Details of the work as well as results can be found in the thesis "Development of a Deep Learning Surrogate for the Four-Step Transportation Model".
Example of prediction of a city's congestion level, using only socioeconomic and network information as input
Dataset | Model | Metric | Test Performance | Publication |
---|---|---|---|---|
Classification Simple - Private Transport | GCNII | Average F1 | 0.87 | Makarov, 2021 |
Classification Simple - Public Transport | GCNII | Average F1 | 0.78 | Makarov, 2021 |
Classification Hard - Private Transport | GCNII | MAE >=10 | 118.5 | Makarov, 2021 |
Classification Hard - Public Transport | GCNII | MAE >=10 | 131.8 | Makarov, 2021 |
Regression - Private Transport | GCNII | MAE >=10 | 135.4* | Makarov, 2021 |
Regression - Public Transport | GCNII | MAE >=10 | 141.2* | Makarov, 2021 |
If you reach a better performance, please issue a pull request with the new values!
* Regression model used only augmented dataset for training data
"MAE >=10" is the mean absolute error, but only applied to test samples where the target has at least 10 units/h. This is done as the lower values can be predicted accurately and are less interesting for the domain. An implementation can be found in the examples and details in the thesis above.
Dataset | Example | Training Samples | Validation Samples | Test Samples | Task | Primary Metric | Graph Sizes | Private Transport | Public Transport | Input & Output Transformed |
---|---|---|---|---|---|---|---|---|---|---|
1_classification_simple | Example | 6398 | 1600* | 2000* | Classification - 3 classes | Average F1 | 15 - 80 original nodes | Yes | Yes | Yes |
2_classification_hard | Example | 16393 | 1600* | 2000* | Classification - 51 classes | MAE >=10 | 15 - 80 original nodes | Yes | Yes | Yes |
3_regression | Example | 16393 | 1600* | 2000* | Regression | MAE >=10 | 15 - 80 original nodes | Yes | Yes | Yes |
* Identical validation and test datasets
All of the data is pickled and compressed with pbz2, with the datasets being ready to be used in PyTorch Geometric. Please see the examples on how to use the data.
The aim of strategic transport planning is to do long term predictions of a given city, based purely on the underlying transport network and socioecenomic data. The socioeconomic data of every household is grouped into so called zones. To solve the problem the model needs to both understand how socioeconomic data creates demand as well as it is applied to the supply side of the network. In practise, this is usually solved by the 4 Step Model, however it has a number of downsides including requiring a large amount of manual work to calibrate and slow prediction speed.
This problem is a great task for deep graph neural networks. However, there is no large public dataset available, so we propose to generate data using in a surrogate model setup, using the 4 Step Model as the baseline. The problem is a transfer learning for GNNs. Within this project, the aim is to create a proof of concept, with all generated cities having between 15 and 80 nodes, and 3-10 zones. Details can be found in thesis.
Augmented Dataset
- Extract random subnetworks from processed OpenStreetMaps and procedural generation for socioeconomic data
- 6398 training samples, 1600 validation samples, 2000 test samples
Synthetic Dataset
- Procedural generation for both network and socioeconomic data
- +9995 training samples
The dataset 1_classification_simple
uses exclusively the augmented dataset. Both 2_classification_hard
and 3_regression
use the augmented dataset for training, validation and test sets, but also add the synthetic dataset for additional training samples. All problems use identical validation and test datasets, with the targets transformed to the respective task.
- The datasets presented have all input features standardized and filtered as needed
- As most common GNNs can only do simulatenous predictions for nodes, all edges are transformed into nodes, thus requiring masking during training
- Irrelevant node/edge features are then set to 0
All details can be found in the thesis above.
If you need inspiration for what to focus on to improve GNNs, here are some open challenges with details in the thesis above:
- Recurrent GNNs - dynamic depth based on size of input graph
- Overcoming the bottleneck of GNNs - seen severely on this dataset
- Seeing whether we can automatically correct nodes propagating errors
If you find any issues with the data or want to get a specific version of the data, please raise an issue over at the top.
If you create any new work based on this dataset, please cite:
@article{Narayanan.2024,
author = {Narayanan, Santhanakrishnan and Makarov, Nikita and Antoniou, Constantinos},
title = {Graph neural networks as strategic transport modelling alternative - A proof of concept for a surrogate},
year = 2024,
journal = {IET Intelligent Transport Systems},
volume = {Under revision},
}
@article{Makarov.2024,
author = {Makarov, Nikita and Narayanan, Santhanakrishnan and Antoniou, Constantinos},
title = {Graph neural network surrogate for strategic transport planning},
year = 2024,
journal = {Under review}
}
@misc{makarov2021,
author = {Makarov, Nikita and Narayanan, Santhanakrishnan and Antoniou, Constantinos},
institution = {Transportation Systems Engineering},
school = {Technical University of Munich},
title = {Development of a Deep Learning Surrogate for the Four-Step Transportation Model},
year = 2021,
url = {https://github.com/nikita68/TransportPlanningDataset}
}
@mastersthesis{makarov2021,
author = {Makarov, Nikita},
institution = {Transportation Systems Engineering},
school = {Technical University of Munich},
title = {Development of a Deep Learning Surrogate for the Four-Step Transportation Model},
year = 2021,
url = {https://mediatum.ub.tum.de/doc/1638691/dwz10x0l0w38xdklv9zkrprqs.pdf}
}
- Parts of the original data © OpenStreetMap contributors
- A big thank you goes to Carlos Llorca Garcia and TUM's MSM institute for providing the pre-processed OSM data
- This dataset (TransportPlanningDataset) is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/