City averages gloss over a lot of important detail. Here are some tips for finding data and combining intra-city geographic data with the primary focus being on Census data.
Obviously a lot of local geographic data comes from local sources:
- Crime and police stats from police departments
- Ward- or precinct-level voting results from state or local governments
- Muni's have city maintenance records with addresses (e.g., street light and pothole repair times)
- Bunch of other stuff...
Can be manually intensive to collect (records requests, data cleaning, etc.) but this is often the more interesting data.
Census gives us local data on the characteristics of people and housing. Things like age, sex, race, poverty, employment, language spoken at home, educational attainment, homeownership, housing characteristics etc...
Tracts | Block Groups | Blocks | |
---|---|---|---|
Typical population | 2000 - 4000 | 800 - 1300 | 20 - 100 |
Data availability | Most measures, long history | Many measures, shorter history | Few measures |
![]() |
![]() |
![]() |
- Places to start:
- Guidance for Data Users of the American Community Survey (ACS)
- Census Reporter is a really nice website for quickly finding data current data from the ACS. Gives quick summary visualizations down to the
- Tabular data (i.e., spreadsheets)
- American Factfinder is an interactive search tool (data for years 2000 - present)
- Developer APIs for when you want to download things programmatically (1990 - present)
- LTDB lets you normalized old Census tract data to the 2010 borders
- IPUMS lets you compile all sorts of interesting statistics based on the underlying microdata
- NHGIS has a good amount of data going way back, often at the tract-level (1790 - present)
- Geographic data (i.e., Shapefiles)
- Cartographic Boundary Shapefiles (1992 - present)
- NHGIS also has a lot of the historical Census boundaries going back to 1790
Example: You have all traffic stops in the city for a year (points) and you want to know the characteristics of the people in the neighborhoods where the most stops occur (polygons)
How to do it:
Geocoding--turning addresses into points:
Example: Tracts are made up of block groups. In WI, three state assembly districts make a state senate district. Etc.
Spatial equivalent of SQL GROUP BY
clause.
How to do it:
A quick and dirty way of joining polygons to polygons is to say a set of polygons matches another if it (1) intersects it, (2) is contained within it, or (3) contains it. (See this spatial joins example using GeoPandas).
Census tract borders change from decade to decade. This makes it hard to get a reliable time series for a given neighborhood. The Longitudinal Tract Data Base (LTDB) lets you take old Census tract data (as far back as 1970) and get estimates that line up with the 2010 Census tracts for all years.
The technique boils down to calculating how much the old polygons intersect with the new and then taking weighted averages using the area size and population as weights. One nice this with this is you can combine geographies that have fairly different borders like tracts, police districts, voting wards.
Caveat: You make some pretty big assumptions when use this technique--basically that the characteristics of the population are distributed evenly across the source geography. It really is an approximation.