Chicago/buildings

Scale the linking algorithm (3 points)

Closed this issue · 5 comments

Right now the linking is being done for a few census tracts. Once the concept is proved, scale the solution to the whole city

The computation involved is memory intensive and I've gotten the best results by processing a single census tract at a time. An idea for scaling then is to batch process each census tract so that the memory required for each batch is manageable.

Taking this out of the MVP milestone because we have a product ready without this feature. Will make this part of the next milestone.

Current strategy for building the JSON-like structure for holding building data:

  • For each large data source (e.g. building and parcel shapefiles), create a separate geojson file for each census tract.
  • Run the record-linking algorithm separately for each census tract
  • Combine the results for each census tract into one data structure for all Chicago buildings

The memory required for building the data structure and linking the records by building seems to grow exponentially, so it is much quicker to break the processing down into manageable chunks. There are 801 census tracts in Chicago.

I've nearly finished creating the 801 files for buildings and for parcels. Still to do:

  • Set up a function that inputs the files created above and the linking function, and outputs the build of the data structure.
  • Check out the structure and confirm that it is building correctly.

All done!

The prototype building.json file is 108 MB and contains one record for each of 488738 buildings. That is all the buildings in Chicago, excluding garages. This file also excludes the polygon shapes for the buildings.

Here are a few sample records.

[
  {
    "bldg_id": "1",
    "pins": ["1130408061", "1130408060", "1130408054", "1130408058", "1130408059"],
    "area": [100.7218, 89.1563, 0.0227, 94.8657, 89.0282],
    "areaRatio": [0.2695, 0.2385, 0.0001, 0.2538, 0.2382],
    "pinsFinal": ["1130408061", "1130408060", "1130408058", "1130408059"],
    "taxsale": 0,
    "demo": 0,
    "address": "7342 N WINCHESTER AVE"
  },
  {
    "bldg_id": "10",
    "pins": ["1130205012"],
    "area": [982.0936],
    "areaRatio": [1],
    "pinsFinal": ["1130205012"],
    "taxsale": 0,
    "demo": 0,
    "address": "1658 W JUNEWAY TER"
  },
  {
    "bldg_id": "100",
    "pins": ["1129102018"],
    "area": [165.3175],
    "areaRatio": [1],
    "pinsFinal": ["1129102018"],
    "taxsale": 0,
    "demo": 0,
    "address": "1421 W JUNEWAY TER"
  },
  {
    "bldg_id": "100005",
    "pins": ["1307329028"],
    "area": [165.9608],
    "areaRatio": [1],
    "pinsFinal": ["1307329028"],
    "taxsale": 0,
    "demo": 0,
    "address": "4907 N NEVA AVE"
  },
... 488733 more records ...
]