osmlab/labuildings

Edit convert.py to deal overlapping addresses

almccon opened this issue · 14 comments

I'm not sure how widespread this is, but I'm finding cases where every apartment in a large apartment building has its own address point.

For example, check out the merged geojson for this part of Pasadena http://data.openstreetmap.us.s3.amazonaws.com/imports/la/merged/buildings-addresses-060374635001.geojson

There are around 50 units at 1155 East Del Mar Blvd (look for AIN 5735017024) and each one has exactly the same lat/lon. I assume we don't want to pile 50 OSM nodes right on top of each other. How does OSM deal with these situations?

JOSM doesn't like nodes in the same location, and ogr2osm seems to merge them, in my experience. I think that gets into indoor mapping territory, which is still mostly uncharted. Perhaps @lxbarth can share some insight on to how they handled large NYC buildings with many addresses. osmlab/nycbuildings#67 dealt with this as well.

Interesting. It looks like NYC had multiple addresses per building, but those were mainly due to multiple street-level entrances. And it seems that those addresses were not directly on top of each other.

Some of this LA data seems to have a separate point for each unit in a building, even if they don't have their own external addresses. That in itself seems fine... but the fact that the points are all co-incident means they're not that useful for indoor mapping.

I'm inclined to detect these coincident points and do the following:

  • Check if they have the same streetnumber and streetname
  • Discard the unit numbers
  • Collapse into a single point

Note that I just checked a few large apartment buildings in LA City and they only had one address point. Perhaps this is only a problem in Pasadena? I'm checking to see how common this problem is.

screen shot 2015-01-15 at 5 58 33 pm
Here's a map of every unique lat/lon that is shared by more than one address. So if an address is by itself (it doesn't share the same coordinates as any other address) it doesn't show up on the map.

LA City has lots of points that have a a few addresses on top of each other, but cities like Long Beach or Pasadena have cases where there are (potentially) 1000 addresses occupying the same point. Wow.

Does anyone want to take a crack at modifying convert.py so it detects addresses sharing the same point, and does something intelligent? Looking at the code (specifically the appendAddress function) it just keeps adding new key-value pairs to the same node. That's better than creating new nodes on top of each other, but instead it should check for the existence of those tags before adding any new ones. And I think it should just remove any addr:unit values if all the addresses share the same street address.

Check out an excerpt from the generated .osm file for census block group 060374635001: http://data.openstreetmap.us.s3.amazonaws.com/imports/la/osm/buildings-addresses-060374635001.osm

<node id="-4818" visible="true" lon="-118.1268751429903" lat="34.14258660706861">
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="3"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="4"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="6"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="5"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="2"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:unit" v="1"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
    <tag k="addr:housenumber" v="1132"/>
    <tag k="addr:postcode" v="91106"/>
    <tag k="addr:city" v="Pasadena"/>
    <tag k="addr:street" v="Cordova Street"/>
  </node>

Not the result we want.

Wow, that map is intriguing. I should check the data for that kind of issue over on the Bmore import ( @osmlab/team-baltimore take a look) I'm no top Python programmer though. If the code can do it, it would make sense to collapse like you said.

Okay, so now if multiple addresses share the same point and the only thing different is the unit name, then I drop the unit name and just import one address point. This fixes most (but not all) of the overlapping address points in Pasadena. Haven't tested it elsewhere yet.

Note that when there are multiple addresses at the same point and I can't combine them into one street address, I change the code to generate .osm files with multiple distinct points on top of each other. I am aware that this will cause validation errors in JOSM, but that's exactly what I am hoping for... those overlapping addresses can only be dealt-with by human oversight, so the JOSM validation error will force people to fix those problems before they import.

I'm having an interesting discussion with @darrell over in the pdxbuildingimport repo, where they're having similar problems. Now I'm thinking that instead of writing out nodes on top of each other, we should nudge the nodes a tiny bit part, just so they're easier to work with in JOSM. Potentially we can also add FIXME tags to these nodes, to indicate that the person doing the manual import should take a look.

the FIXME tags can be filtered on either JOSM or Merkaator, should we begin to prepare a list of issues to find during manual review? Two things come up to me, the address points (there may be address points already added) and existing buildings (do we replace the ones in OSM with the import?) I can help with this as I'm now a little more adept to working in github.

That's a good idea @cityhubla. I created a new issue #10 to track that. We definitely do not want to blindly replace the buildings already in OSM, because they're most likely more up-to-date than the county data we have available to us (from 2008).

This is still a live issue. In February (eb770f4) I was able to consolidate multiple addresses in to a single address, if they were all different unit numbers within an apartment building. But that still doesn't catch everything.

Currently, if we can't consolidate, we just export a bunch of address nodes on top of each other. See: https://github.com/osmlab/labuildings/blob/master/convert.py#L462-L474 That's not terrible, because the JOSM validator can catch that, but perhaps we can do something smarter.

In PDX, we take the coincident points and move them a little bit, so they're not totally overlapping, but still contained in the building.

Makes them easier to grab and manually move, too.

CREATE or REPLACE FUNCTION perturb_point(pt geometry) returns geometry AS $$
  DECLARE
    srid integer;
    offset_x double precision;
    offset_y double precision;
  BEGIN
    offset_y:=random()*0.00001;
    offset_x:=random()*0.00001;
    srid:=st_srid(pt);
    pt:=st_setsrid(st_makepoint(st_x(pt)+offset_x, st_y(pt)+offset_y), srid);
    RETURN pt;
  END;
$$ language plpgsql;

Closing. Let's re-open if we plan to import address in the future.