osmlab/labuildings

Conflate addresses?

Closed this issue · 14 comments

Addresses are not included in the building footprint data we are planning to import. But, addresses are available from LA County as points. Do we want to try to conflate these before import? How have other imports done this?

Here's a screenshot from QGIS showing an area in Venice, CA showing the building footprints and address points. It looks pretty messy.

screen shot 2014-10-02 at 12 12 16 pm

We thought this through with the DC and NYC imports. Code is available here:

If possible the scripts in both of those repos should be used, they solve some nasty problems that most other scripts/converters don't. NYC is more up to date and more performant. There are also scripts for splitting the entire process into chunks which I highly recommend as doing everything at once with a single dataset is painful.

I'm slammed for the next ~week but can give pointers if there are specific questions about those scripts.

Hm, this particular screenshot doesn't look all too confidence inducing. Before deciding on whether to conflate addresses with buildings I recommend taking a long hard look at the quality of the address dataset and decide whether it's worth importing. Cleaning up the NYC addresses was certainly what made the import take longest.

I found that UCLA has a parcel dataset, thanks to @cityhubla's links in #4. The parcel file is linked from here: http://gis.ats.ucla.edu/Mapshare/. It has polygons for each parcel, which have no attribute data except an ID called AIN which you can join onto tax data, which has addresses.

Using the tax addresses might be far too much work. Instead, what if we use the parcels polygons to do a spatial join with the addresses, and then another spatial join to add the addresses to the buildings? I think this would only work when there's 1:1 relationships for each join, but it looks like it would help for a reasonable number of the buildings.
screen shot 2014-11-15 at 5 33 31 pm

@cityhubla, can you follow up with UCLA and/or the assessor to confirm the licensing of this parcel data? I don't think we want to import the parcels into OSM, but I'd love to use them as an intermediate step to link addresses to buildings.

Sure, I'll verify

I sent an email, two days ago to County and UCLA regarding the use of the parcel data. I'm still waiting word.

I've been looking at this a little bit more, and I think we can safely ignore the parcels. The address data and the building data both have an AIN field which would allow us to link the buildings and addresses in exactly the same way we would have done it with the parcels. (In fact, the AIN is the parcel identifier). So, I think we can forget about the parcels, @cityhubla.

Here's a QGIS screenshot in the 060372731003 block group (approximately here) which is included in my venice_chunks zipfile. I'm showing the AIN field for both the addresses and the buildings, and you can see that they correspond.
screen shot 2014-12-27 at 27 dec 3 41 01

Programmatically, I only feel safe assigning addresses when there's only one address and one building for each AIN. That doesn't solve very many of the detached addresses, but it will fix some of them.

Most of the time there are two or more buildings that share the same AIN (usually the main house and a detached garage that share the same parcel). Perhaps we could choose to assign the address to the building which has the largest area, or is closest to the address point, but this seems error-prone, and better done with human oversight.

In some cases, there are two addresses for the same AIN, but I think these are two sides of the same house, not a separate address for the back garage. So in those cases I propose we just import the addresses as-is, and not try to conflate them with the buildings.

Okay, I've implemented the AIN join in cdd4342.

screen shot 2014-12-27 at 27 dec 8 22 31

Here's what it looks like in JOSM (those are address numbers it's showing). Most of the addresses on the right were assigned to the houses (even though they didn't intersect originally) because there was only one address and one building with each AIN. But the addresses in the upper left of the screenshot were not assigned to the buildings, because there were two buildings with the same AIN (all of those houses have garages). Good enough, in my opinion.

Sorry for the delay, final year end push to finish projects at work.

I agree with human oversight of AIN with more than one building. There are many residential buildings tied to one AIN. I live on one that has 3 homes on one AIN, no garage. Is there any way to mark AINs with more the one building during import?

Will there be two addresses tied to a building at a street intersection? There are commercial buildings that often have addresses from both streets.

I'm free today to test the conversion code on Issue #3. I'm limited in my .py code chops but well versed in zoning and building codes.

venice1

I got this using NavigateLA from the City's Dept of Engineering and Public Works.

This is what I mean with addresses from two streets on one building, this is the corner of Pacfic and Windward in Venice. I'll check if there are issues with the conversion code.

It shouldn't make a difference whether the addresses are on different streets, or if they're multiple addresses on the same street: anytime there is more than one address for a building, we'll import those as address points, and there will be not address information added to the building. Each address point contains the street number and the street name, so they are autonomous from any other address points, in that sense.

FYI, here is the output from merge.py when I run it on the Venice data:

060372732003: 141/591 (23%) addrs hit bldgs, 60/383 (15%) bldgs have at least one addr
060372732003: using AINs matched 22 more addresses
060372731001: 159/412 (38%) addrs hit bldgs, 68/444 (15%) bldgs have at least one addr
060372731001: using AINs matched 46 more addresses
060372732004: 32/299 (10%) addrs hit bldgs, 25/370 (6%) bldgs have at least one addr
060372732004: using AINs matched 48 more addresses
060372731002: 164/633 (25%) addrs hit bldgs, 100/666 (15%) bldgs have at least one addr
060372731002: using AINs matched 22 more addresses
060372733001: 222/628 (35%) addrs hit bldgs, 130/596 (21%) bldgs have at least one addr
060372733001: using AINs matched 37 more addresses
060372733002: 64/200 (32%) addrs hit bldgs, 48/232 (20%) bldgs have at least one addr
060372733002: using AINs matched 26 more addresses
060372731003: 289/741 (39%) addrs hit bldgs, 58/645 (8%) bldgs have at least one addr
060372731003: using AINs matched 64 more addresses
060372733003: 123/359 (34%) addrs hit bldgs, 91/424 (21%) bldgs have at least one addr
060372733003: using AINs matched 50 more addresses
060372734021: 97/142 (68%) addrs hit bldgs, 91/172 (52%) bldgs have at least one addr
060372734021: using AINs matched 22 more addresses
060372734022: 146/206 (70%) addrs hit bldgs, 89/155 (57%) bldgs have at least one addr
060372734022: using AINs matched 12 more addresses
060372732001: 144/493 (29%) addrs hit bldgs, 54/370 (14%) bldgs have at least one addr
060372732001: using AINs matched 19 more addresses
060372734023: 147/221 (66%) addrs hit bldgs, 129/238 (54%) bldgs have at least one addr
060372734023: using AINs matched 34 more addresses
060372732002: 68/351 (19%) addrs hit bldgs, 29/369 (7%) bldgs have at least one addr
060372732002: using AINs matched 40 more addresses
060372734024: 143/195 (73%) addrs hit bldgs, 117/217 (53%) bldgs have at least one addr
060372734024: using AINs matched 10 more addresses
060372735022: 128/191 (67%) addrs hit bldgs, 71/124 (57%) bldgs have at least one addr
060372735022: using AINs matched 18 more addresses
060372735021: 262/321 (81%) addrs hit bldgs, 117/163 (71%) bldgs have at least one addr
060372735021: using AINs matched 6 more addresses
060372735023: 213/384 (55%) addrs hit bldgs, 154/311 (49%) bldgs have at least one addr
060372735023: using AINs matched 50 more addresses
060372736003: 30/198 (15%) addrs hit bldgs, 20/289 (6%) bldgs have at least one addr
060372736003: using AINs matched 16 more addresses
060372735024: 279/486 (57%) addrs hit bldgs, 210/484 (43%) bldgs have at least one addr
060372735024: using AINs matched 76 more addresses
060372736004: 261/660 (39%) addrs hit bldgs, 74/628 (11%) bldgs have at least one addr
060372736004: using AINs matched 52 more addresses
060372736001: 128/349 (36%) addrs hit bldgs, 59/332 (17%) bldgs have at least one addr
060372736001: using AINs matched 33 more addresses
060372737001: 143/314 (45%) addrs hit bldgs, 125/584 (21%) bldgs have at least one addr
060372737001: using AINs matched 28 more addresses
060372736002: 65/316 (20%) addrs hit bldgs, 46/422 (10%) bldgs have at least one addr
060372736002: using AINs matched 55 more addresses
060372737002: 291/468 (62%) addrs hit bldgs, 109/428 (25%) bldgs have at least one addr
060372737002: using AINs matched 14 more addresses
060372738001: 355/842 (42%) addrs hit bldgs, 83/742 (11%) bldgs have at least one addr
060372738001: using AINs matched 33 more addresses
060372737003: 284/643 (44%) addrs hit bldgs, 146/719 (20%) bldgs have at least one addr
060372737003: using AINs matched 26 more addresses
060372738002: 81/351 (23%) addrs hit bldgs, 55/406 (13%) bldgs have at least one addr
060372738002: using AINs matched 42 more addresses
060372739023: 236/321 (73%) addrs hit bldgs, 182/282 (64%) bldgs have at least one addr
060372739023: using AINs matched 37 more addresses
060372738003: 132/316 (41%) addrs hit bldgs, 94/309 (30%) bldgs have at least one addr
060372738003: using AINs matched 26 more addresses
060372739024: 184/435 (42%) addrs hit bldgs, 128/551 (23%) bldgs have at least one addr
060372739024: using AINs matched 45 more addresses
060372739021: 189/464 (40%) addrs hit bldgs, 155/556 (27%) bldgs have at least one addr
060372739021: using AINs matched 101 more addresses
060372739025: 167/251 (66%) addrs hit bldgs, 137/242 (56%) bldgs have at least one addr
060372739025: using AINs matched 45 more addresses
060372739022: 163/445 (36%) addrs hit bldgs, 121/342 (35%) bldgs have at least one addr
060372739022: using AINs matched 58 more addresses
060372741001: 383/1169 (32%) addrs hit bldgs, 99/517 (19%) bldgs have at least one addr
060372741001: using AINs matched 73 more addresses
060372742021: 253/570 (44%) addrs hit bldgs, 170/356 (47%) bldgs have at least one addr
060372742021: using AINs matched 138 more addresses
060372742022: 160/255 (62%) addrs hit bldgs, 117/196 (59%) bldgs have at least one addr
060372742022: using AINs matched 28 more addresses
060372741002: 117/479 (24%) addrs hit bldgs, 78/460 (16%) bldgs have at least one addr
060372741002: using AINs matched 118 more addresses
060372742023: 102/229 (44%) addrs hit bldgs, 73/157 (46%) bldgs have at least one addr
060372742023: using AINs matched 42 more addresses
060372742024: 143/231 (61%) addrs hit bldgs, 109/187 (58%) bldgs have at least one addr
060372742024: using AINs matched 24 more addresses

I'm also committing a venice_merged.zip file which includes the results of merge.py and venice_osm.zip file which includes the results of convert.py.

As I said, I'm pretty happy with the result of the merge... I think it's about the best we can do. But I know for certain that we still need to improve the .osm files that are the result of convert, specifically the fields that we capture ( #3 ). I'm uploading these zip files just so people have some examples to play with, especially for the people who can't run the python scripts on their own machine.

The telenav team has written a piece of software for conflating ways called Cygnus: http://www.openstreetmap.org/user/mvexel/diary/35483

I don't know how well (or if at all) this works with areas and nodes.

There's a lot of conflation code in https://github.com/osmlab/labuildings/blob/master/convert.py, which works as good as possible. It conflates if one address falls within one building, or if there is a one-to-one AIN relationship between a building polygon and an address node. If those two conditions are not satisfied, address nodes are not conflated with buildings. We will leave it to the manual importer to do any other conflation that may be possible.