Baltimore Roof Damage

This is a project to locate row houses in Baltimore that have extensive roof damage. It's a slimmed down and refactored version of a larger project done as part of Data Science for Social Good, 2022.

Overview

This project handles loading and cleaning data, training models that estimate the likelihood of roof damage, and outputting predictions from those models.

There are three data sources:

  • A Geodatabase (gdb) file
  • Tabular files (CSVs or Excel files)
  • Aerial images

And two models:

  • An image model that predicts the likelihood of roof damage given an aerial photograph
  • An overall model that takes in data from many sources, including the outputs of the image model, and predicts the likelihood of roof damage

Installation

System Requirements

  • At least 200GB of hard drive space. This is mostly occupied by the aerial images.
  • At least 32GB of RAM (this could be relaxed in the future).
  • A CUDA-enabled GPU is optional, but recommended.

Docker

The easiest way to get this project running is with Docker. Make sure you have Docker (Docker Engine, either by itself or through Docker Desktop) and Docker Compose installed. From a command shell in the project directory, just run docker-compose up. This will install and run everything required to use this project.

Once the containers are running, you can start interacting with the app by running bash in the main container with docker-compose run roofs bash.

For GPU acceleration of model training and inference, you need a GPU. If you've using a CUDA-enabled NVIDIA GPU, you can install the NVIDIA Container Toolkit and the existing code and Docker setup should handle running everything on the GPU.

Bare metal

TODO. Look at docker-compose.yaml and the Dockerfile for now.

Running

In the root of the project, create a .env file with the following keys:

PGUSER=user
PGPASSWORD=password
PGHOST=db
PGPORT=5432
PGDATABASE=roofs

The main mode of interacting with this project is through a series of command-line interface commands. The full list of commands is:

roofs --help                      Shows help documentation. Works for all subcommands

roofs db filter                   Filter the ground truth to just the row homes we're interested in
roofs db import-gdb               Import a Geodatabase file
roofs db import-sheet             Import spreadsheets of data
roofs db reset                    Remove all data from the database
roofs db status                   Show the status of the database

roofs images crop                 Crop aerial image tiles into individual blocklot images
roofs images dump                 Dump JPEG images of blocklots to disk for further...
roofs images status               Show the status of the blocklot image setup process
roofs images predict              Make predictions using just the image model

roofs train status                Status of the training pipeline
roofs train image-model           Train an image classification model from aerial photos
roofs train model                 Train a new model to classify roof damage severity

roofs report predictions          Generate roof damage scores from a given model
roofs report evals                Evaluate the performance of a given model
roofs report html                 Generate an HTML report of predictions

roofs misc merge-sheets           Merge a number of CSVs or Excel files on blocklot

A full run of the project from data loading to model training to prediction looks similar to this series of commands. Use the --help argument to each of these commands for a full understanding of what's going on and the status command of each subcommand to check that everything is proceeding normally.

$ roofs db import-sheet --inspection-notes data/InspectorNotes_Roof.xlsx
$ roofs db import-gdb data/roofdata_2024.gdb \
  --building-outlines=building_outline_2010 \
  --building-permits=building_construction_permits \
  --code-violations=code_violation_data_after_2017 \
  --data-311=Housing_311_SR_Data \
  --demolitions=completed_city_demolition \
  --ground-truth=roof_data_2018 \
  --real-estate=real_estate_data \
  --tax-parcel-address=tax_parcel_address \
  --redlining=redlining \
  --vacant-building-notices=open_notice_vacant
$ roofs db filter
$ roofs db status
$ roofs images crop data/aerial_images data/images.hdf5
$ roofs images status data/images.hdf5
$ roofs images dump data/images.hdf5 . -b "1152 011"
$ roofs train status data/images.hdf5
$ roofs train image-model data/images.hdf5
$ roofs train model data/images.hdf5 models.csv 
$ roofs report evals models/6c87d283-bfee-4075-bd72-a0d4355d356a.pkl 6c87_eval.csv data/images.hdf5
$ roofs report predictions models/6c87d283-bfee-4075-bd72-a0d4355d356a.pkl 6c87_preds.csv data/images.hdf5
$ roofs report html 6c87_preds.csv data/aerial_images 6c87_report.html
$ roofs misc merge-sheets merged.csv 6c87_preds.csv 2022_data.xlsx

If you already have the cropped images (at data/images.hdf5), an image model (at models/image_model.pth), and an overall model (at f940...0.pkl), the series of commands looks like this:

$ roofs db import-sheet --inspection-notes data/InspectorNotes_Roof.xlsx
$ roofs db import-gdb data/roofdata_2024.gdb \
  --building-outlines=building_outline_2010 \
  --building-permits=building_construction_permits \
  --code-violations=code_violation_data_after_2017 \
  --data-311=Housing_311_SR_Data \
  --demolitions=completed_city_demolition \
  --ground-truth=roof_data_2018 \
  --real-estate=real_estate_data \
  --tax-parcel-address=tax_parcel_address \
  --redlining=redlining \
  --vacant-building-notices=open_notice_vacant
$ roofs db filter
$ roofs db status
$ roofs images status data/images.hdf5
$ roofs images dump data/images.hdf5 . -b "1152 011"
$ roofs train status data/images.hdf5
$ roofs images predict data/images.hdf5
$ roofs report evals models/f940d7a5-e5c0-4267-8764-c02fa3542730.pkl f940_eval.csv data/images.hdf5
$ roofs report predictions models/f940d7a5-e5c0-4267-8764-c02fa3542730.pkl f940_preds.csv data/images.hdf5
$ roofs report html f940_preds.csv data/aerial_images f940_report.html
$ roofs misc merge-sheets merged.csv f940_preds.csv 2022_data.xlsx

Open Questions

  • License?