A database for visualizing and understanding the impacts of federal disasters on real estate prices.
Large disasters are known to depress real estate prices for months or years. Some areas recover quickly while others do not. This database is designed to assist home buyers and sellers, investors, and researchers in understanding past temporospatial trends in real estate pricing while enabling them to make predictions regarding current and future events.
An outline of the data pipeline architecture. Not every database requires geospark and therefore utilize Apache Spark. The ETL cluster can be reconfigured to optimize the processing of the incoming data. Please see EC2 Setups for scripts and more details.
These consistute a small sample of what is available.
These Federal Databases contain both geospatial and impact data, but frequently provide less information on exact dates.
Challenges in creating this database came in two main areas:
- Working with the myriad different databases that store real estate sales and listing prices
- Creating a unified data model to connect the variety of data inputs
- Deploying customized ETL clusters to better handle varied data input sources
- Creating a database that works efficiently with data organized in both time and space
- A significant amount of preprocessing is done with geospark in order to speed up later database calls
- The database is deployed using both PostGIS and TimescaleDB to facilitate temporospatial indexing and deployment