Revisiting travel demand using big data: an empirical comparison of explainable machine learning models
Using the nationwide census block group (CBG)-level population inflow derived from Mobile device location data (MDLD) as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify the interpretation robustness.
- Data used for model building is located at the folder
data
, which is computed via1.0-Match_CBG_POI.py
,1.1-Read_Data.py
,1.2-Data_EDA.py
. 3.0-models-origin.py
,3.1-models-transform.py
are used for model training and tuning. The first uses the original data while the second considers the data transformation.4.1-Interpret models.py
is used to interpret the trained model.