Firefighting is one of the most cost-demanding domains of the public safety sector in terms of after-damage and preventive measures. Nonetheless, it is a fundamental public service and keeping track of its cost can benefit both the firefighters working at the heart of the action and the government bodies which manage them.
Therefore, in this project, we answer the following research questions:
- Is it possible to predict the notional cost of firefighting operations using tree-based models?
- Which tree-based model provides the best prediction accuracy?
- What features correlate more or less strongly with the notional cost?
To answer these questions, we use public data from London (UK), which can be found on Kaggle:
- London Fire Brigade Incidents ("lfb_incident.csv"), containing fire incident data from 2009 to 2022.
- London Weather Data ("london_weather.csv"), containing weather information from 1979 to 2021.
We join both CSV files by date, ignoring fire incidents from 2022 and weather records from 1979 to 2008 since there are no matching records in both datasets. The final dataset contains 1,286,617 rows. We use this dataset to train three tree-based models that work well with tabular data:
- decision tree
- random forest
- boosted tree
As for the algorithms, we use the following implementations provided by the Python libraries scikit-learn and XGBoost:
We use five features as the inputs of our models:
- month of the fire incident report call
- building type
- number of fire pumps attending the incident
- number of hours the fire pumps worked
- mean daily temperature (Cº)
Our output feature is the fire pumps' notional cost in pound sterling (£). The cost value was originally a continuous numerical variable, but we converted it to a categorical variable, dividing and categorizing the numerical value in intervals of £300. This process is explained in more detail in preprocessing/2_check_correlation.ipynb.
Finally, we evaluate to determine the best model in terms of prediction accuracy. In practice, the model we describe in this project could help estimate the operation cost of fire departments as soon as a call for a fire incident is made.
Feature | Type | Description | Example |
---|---|---|---|
DateOfCall | Categorical | Date of call to fire dept (we kept the month only) | 12 |
PropertyType | Categorical | Description of the place where the fire happened | House - single occupancy |
NumPumpsAttending | Integer | Number of pumps used in the incident. Number of firefighters = number of pumps multiplied by five | 3 |
PumpHoursRoundUp | Integer | Time spent at incident by pumps, rounded up to nearest hour | 1 |
mean_temp | Float | Mean temperature in degrees Celsius (°C) | 2.8 |
Notional Cost (£) | Categorical | Time spent multiplied by notional annual cost of a pump, in pounds. Originally in pounds, we converted this feature from numerical to categorical. | 1 |
We divide the rows in the dataset between training (66.6%) and testing (33.3%).We use hyperparameter tuning and cross-validation. More details at chart.ipynb.
In order to execute the notebooks in this repository, you will need Python 3.8. The file requirements.txt in the root of the project contains the dependency list.
pip install -r requirements.txt
To compare the different models we use plotly and matplotlib and we make a bar graph
For cleaning, reshaping and checking the data:
- Data preprocessing: preprocessing/1_data_cleaning_with_weather.ipynb
- Feature correlation: preprocessing/2_check_correlation.ipynb
For training and evaluating the tree-based models:
- Decision trees: decision_tree.ipynb
- Random forest: random_forest.ipynb
- Boosted trees: boosted_tree.ipynb