pnorman/ogr2osm

Transfer node id's from shapefile to .osm file

Opened this issue · 9 comments

I have been able to convert a custom road network in shapefile format to .osm data. The original shapefile contains a column "source" and a column "target" to indicate the node id's. After the conversion to .osm format however, these node id's appear to have lost. Instead, new node id's have been chronologically numbered in the .osm file (see below). Is there a way to transfer the original node id's from the shapefile to the .osm file?
Screen Shot 2021-02-12 at 15 06 55

What exactly do you want to achieve? It is possible to create osm tags from shapefile attributes, you may have to write a translation script defining the function filterTags to do this.
However, the id will always be generated by ogr2osm, and I do not see a good reason why that should be changed.

For a project I'm using OSRM to map match a series of GPS measurements to a custom road network. My goal is to provide insight in where but more importantly how many times the vehicle has visited each road segment in the road network. My desired output is (1) an attribute table of the road network with an extra column that shows the number of times the vehicle drove through the street, (2) a map visualisation of the flow of traffic through the network, with the width of each line segment representing the amount of traffic (i.e. based on the visit frequency values in the attribute table). Something like this:
Screen Shot 2021-02-10 at 16 17 08

I have been able to successfully map match the GPS points to the converted .osm network. Using the option annotations=nodes in the OSRM request I'm able to get all the node id pairs and geometry from the API response. This would allow me to tabulate and rank the most common segments, and display them visually, no joining to a PBF required. However, with the original node id's from the shapefile lost after the conversion, it is impossible to come to the desired output as described above. Do you have any suggestions?

Ok I see. Can you provide a small sample shapefile please? Just 5 or 10 roads should be enough.

Sure: sample.zip. Thanks for helping out

Many thanks. Converting the shapefile to osm using ogr2osm is straightforward as you already found out. I can see two things in the output:

  • the nodes which make up the ways have no extra tags apart from a (generated) id, lon and lat. This is standard behaviour for ogr2osm, it will require further investigation to see if more information is available. I'll see when I have the time, it may take a few days.
  • the ways contain extra feature tags, being id, source, target, speed, cost and highway. If you want you can copy the value of the v attribute of <tag k="id" v="..."> to the id of the way with a small xslt transformation.
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes"/>
      <xsl:template match="@*|node()">
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:template>
      <xsl:template match="way/@id">
        <xsl:attribute name="id">
          <xsl:for-each select="parent::way/tag">
            <xsl:choose>
              <xsl:when test="./@k='id'">
                <xsl:value-of select="number(./@v)"/>
              </xsl:when>
            </xsl:choose>
          </xsl:for-each>
        </xsl:attribute>
      </xsl:template>
    </xsl:stylesheet>
    
    Save as the above script as sample.xslt and run xsltproc sample.xslt sample.osm > output.osm

However, I looked somewhat further into your requirement and found issue Project-OSRM/osrm-backend#5960, where you suggest the node id's of a given way are stored in the tags source and target. But as you can see from the example below the last way in the sample file is made up of 6 nodes with (generated) id's -294, -303, -302, -301, -300 and -287.

<way visible="true" id="-354">
  <nd ref="-294"/>
  <nd ref="-303"/>
  <nd ref="-302"/>
  <nd ref="-301"/>
  <nd ref="-300"/>
  <nd ref="-287"/>
  <tag k="id" v="-230311.000000000000000"/>
  <tag k="name" v="Lijnbaansgracht"/>
  <tag k="source" v="988436.000000000000000"/>
  <tag k="target" v="988434.000000000000000"/>
  <tag k="speed" v=""/>
  <tag k="cost" v="46.130062633008400"/>
  <tag k="highway" v="residential"/>
</way>

Which nodes correspond with id's 988436 and 988434? Or are these values unrelated to the nodes?

To answer your question I opened the sample_network.osm file in JOSM.

Screen Shot 2021-02-24 at 11 26 04

Two observations:

  1. The way you're talking about (with id=-354) indeed consists of 6 nodes, which are irregularly spread along the way. Ogr2osm seems to generate a node at each location along the network where the way changes its direction. This explains why our way has 6 nodes instead of 2.
  2. The (end) nodes that correspond to the id's 988436 and 988434 are the nodes with the generated id's -287 and -294, meaning that the other four nodes are located between these nodes.

This leaves two challenges:

  1. How to prevent ogr2osm from generating these "extra" nodes. A possible workaround would be to simplify the network structure to make sure that the road network only consists of straight line segments, which hopefully then would result in only 2 nodes for our way in question. However, simplifying the road network would not be the most ideal solution. Is there a way to change ogr2osm's behavior to prevent this from happening?

  2. How to make sure that ogr2osm (1) reads the node id information from the shapefile (which is stored in the source and target node id columns of each line segment), (2) stores these as the corresponding nodes in the converted .osm file, like below. However, I'm not sure if ogr2osm is able to achieve this. Would another workaround be to do this with a xslt transformation afterwards you think?

<way visible="true" id="230311">
  <nd ref="988436"/>
  <nd ref="988434"/>
  <tag k="id" v="-230311.000000000000000"/>
  <tag k="name" v="Lijnbaansgracht"/>
  <tag k="source" v="988436.000000000000000"/>
  <tag k="target" v="988434.000000000000000"/>
  <tag k="speed" v=""/>
  <tag k="cost" v="46.130062633008400"/>
  <tag k="highway" v="residential"/>
</way>

Mind that the extra nodes are not generated by ogr2osm, they are already present in the shapefile. The purpose of shapefiles or osm files is to be able to represent a way as it is, including all corners. But I can imagine that for routing purposes only the start and the finish are relevant.
One solution would be to simplify the ways as you already suggested, but I assume the extra nodes can just be kept as long as the id's of the start and end match the source and target values. Either way: you have to make a choice. If you want to keep the shape of a way then you need all its nodes; keeping only the first and the last node will simplify the way to a straight line.
Both solutions can probably be achieved with an xslt transformation, but it seems complicated and I am not an expert in xslt. So I tried another approach with a translation file for ogr2osm, and it turned out to be really easy.

from ogr2osm import Way

def preOutputTransform(geometries, features):
    for feat in features:
        geom = feat.geometry
        if type(geom) == Way:
            # set the id of the way to feat.tags['id']
            geom.id = int(float(feat.tags['id']))
            # set the id of the first node to feat.tags['source']
            geom.points[0].id = int(float(feat.tags['source']))
            # set the id of the last node to feat.tags['target']
            geom.points[-1].id = int(float(feat.tags['target']))

Save the above script as translation.py and invoke ogr2osm input.shp -o output.osm -t translation.py

The script is not failsafe though, problems may arise when the same node is referenced with several different id's in the shapefile. Solving this will require some python skills but it is not impossible.

Shapefiles don't have nodes, they have points, linestrings and polygons. OSM files have nodes, ways and relations. These do not correspond 1:1, and there are no IDs to copy over. For example, as you've seen, a shapefile with only a linestring in it will produce an OSM file with nodes and ways. This is intentional. These are not "extra" nodes, but an important part of how the data is represented.

@pnorman Thanks for clarifying that part.

@roelderickx Thanks a lot! When opening the output.osm file, the translation file indeed appears to have successfully set the node id's according to the corresponding point id's from the shapefile. I was hoping I could have a closer look by opening the output.osm file in JOSM. However, JOSM returns the following error:

Could not read file 'output.osm'. Error is: Missing attribute 'version' on OSM primitive with ID 16604. (at line 3, column 78). 157 bytes have been read

Since only certain node id's were updated and I have been able to open the sample_network.osm file (without using the translation.py file) in JOSM, I don't understand why this is error occurs. Do you have any suggestions how I can still open the file in JOSM so that I can inspect the nodes and ways visually on a map?