To estimate building's height on historical maps based on the building's characteristics in the present time using Random Forest Regression.
This project aims to estimate building's height on historical maps based on the building's characteristics in the present time. Random forest regression are used to estimate the building's height, and the result will be visualized into 3D City Model Visualization that could be used for further spatial analysis.
Historical maps are one of the data available from the past. These maps are commonly found in physical form, have small scales, and missing the height of objects depicted in the map.
Data used in this project are derived and calculated from various sources (Sekolah Tinggi Teknik Cirebon archives, Heritage Smart City Planning research results, and Direktorat Penataan Bangunan dan Lingkungan Kementerian Pekerja Umum) and some of them are confidential.
List of data used:
- Building's characteristics of Cirebon City 1940
- Building's characteristics of Cirebon City 2020
Data Dictionary (predictors):
- Perimeter : the perimeter of each building
- VxCount : the number of vertices (edge) of each building
- CentroidX : the value of centroid of X coordinate for each building
- CentroidY : the value of centroid of Y coordinate for each building
- Shape_Area: the area of each building
- neighborco: the value of neighbors around the building
- Kategori_Bangunan: the category of each building
- Z_Mean : building's height
- KawasanPerdagangan : market area
- KawasanPermukiman : residential area
- KawasanPrasarana : public facilities and infrastructure area
- KawasanWisata : tourist area
- BangunanKeraton : palace buildings
- WilayahJasa : public services area
- Predictors calculation : historical map and orthophoto processing GIS software (ArcGIS and QGIS)
- Target calculation : DTM and DSM processing using ArcGIS
- Code : Python in Jupyter Notebook
- Packages : Scikit-learn, pandas, matplotlib
the Cirebon 2020 data are split into training and testing data.
Correlation between predictors: are calculated using pearson's for both numerical perdictors, phi K for both categorical predictors, and point biserial for correlation between numerical and categorical predictors.
Feature Importance:
predictors correlation and importace analysis resulting in the elimination of shape area predictor.
Random forest regression model are built using the training data of Cirebon 2020 data only.
The model are evaluated by calculating the MAE, MSE, and RMSE between building's actual height and estimated height of Cirebon 2020 data. The values are 1.12, 1.75, and 1.32 meters respectively.
The height value are also extruded to 3D Visualization for visual evaluation purpose.
The model are then used to predict/estimate building's height in the Cirebon 1940 historical map.
The model had successfully estimate the building's height on the 1940 Cirebon Map in respect to Cirebon 2020 data resulting R-Squared value 61.2 % and various metrics for estimated building height are calculated using MAE, MSE, and RMSE as 1.12, 1.75, and 1.32 meters respectively. Building's height are applied to build the 3D Model Visualization. Thank you!