Reducing the Margins of Error in Census Tract and Block Group Data from the American Community Survey

The American Community Survey (ACS) is the largest survey of US households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts, the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance. This article presents a heuristic spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to process survey data in order to reduce the margins of error to a user-specified threshold.

This repository includes code that reduces the margins of error in ACS Tract and Block Group Level Data by "intelligently" combining Census geographies together into regions. A region is a collection of 1 or more census geographies that meets a user specified margin of error (or CV). We refer to this procedeure as "regionalization."

Technical details of this paper and example implementations are described in this PLOSOne Paper.

Getting Started

Prerequisites

All the scripts are written for Python 2.7 (earlier versions have not been tested). We recommend installing Anaconda python as this distribution provides easy access to all the necessary libraries to run the code. There are a dependencies on the following libraries.

Numpy 1.3 or later
Scipy 0.7 or later
PySAL 1.5 or later
[pandas] (http://pandas.pydata.org) 0.11.0 or later
MDP 3.2 or later
Bottleneck 0.7 or later

On Debian/Ubuntu-based distributions, some additional libraries need to be installed (GDAL and Geos)

sudo apt-get install libgeos-dev libgdal1-dev

##Examples We have built two Jupyter Notebooks to show the functionality of the code. The notebooks and all input data needed to run them are included in the repository. The notebooks require the matplotlib, shapely and geopandas packages for the visulaizations. Static versions can be viewed from the following links.

Toy Example is a very simple example on simulated data.
Austin Example is a more complex example using data from the Austin metro area.

philipaconrad/ACS_Regionalization

Reducing the Margins of Error in Census Tract and Block Group Data from the American Community Survey

Getting Started

Prerequisites