Gabboxl/gtfs-osm-import

Use relations generation for offline purposes + reldiff crashes

koenvanhollebeke opened this issue ยท 15 comments

Hi,
Still on the same gtfs file (Tursib). stops works fine, but reldiff/reldiffx (using the interactivedebug) crashes at the end of the file. The fullrels CLI option closes normally.

reldiff/reldiffx:

Warning: Relation 5481564 has a member node with an unsupported role "stop_entry_only", node ref/Id = 8257724560
Warning: Relation 5481564 has a member node with an unsupported role "platform_entry_only", node ref/Id = 8257724559
Warning: Relation 5483029 has a member (id: 5481563) of an unsupported type "relation"
Warning: Relation 5483029 has a member (id: 5481564) of an unsupported type "relation"
Skipping OSM relation 5483029 as its type tag (S.C.Tursib S.A) is not a route.
Warning: Relation 5515258 has a member (id: 5481563) of an unsupported type "relation"
Warning: Relation 5515258 has a member (id: 5481564) of an unsupported type "relation"
Skipping OSM relation 5515258 as its type tag (null) is not a route.
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 9
at it.osm.gtfs.input.GTFSParser.readRoutes(GTFSParser.java:245)
at it.osm.gtfs.commands.gui.GTFSRouteDiffGui.readData(GTFSRouteDiffGui.java:102)
at it.osm.gtfs.commands.gui.GTFSRouteDiffGui.(GTFSRouteDiffGui.java:50)
at it.osm.gtfs.commands.CmdRelDiffGui.call(CmdRelDiffGui.java:32)
at it.osm.gtfs.commands.CmdRelDiffGui.call(CmdRelDiffGui.java:26)
at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
at picocli.CommandLine.execute(CommandLine.java:2170)
at picocli.shell.jline3.PicocliCommands.invoke(PicocliCommands.java:283)
at org.jline.console.impl.SystemRegistryImpl.execute(SystemRegistryImpl.java:1228)
at org.jline.console.impl.SystemRegistryImpl.execute(SystemRegistryImpl.java:1273)
at it.osm.gtfs.GTFSOSMImport.interactive(GTFSOSMImport.java:176)
at it.osm.gtfs.GTFSOSMImport.main(GTFSOSMImport.java:202)

The relation 5515258 in this case is the last one in the downloaded relations.osm file (/home/koenvh/gtfs-osm-import/cache).
For the fullrels CLI option closes "normally":

The relations generation will not continue as there are no OSM stops with a GTFS id on OpenStreetMap!
Please run the "stops" command and upload the new stops to OSM first!

Strange because the code in CmdGenerateRoutesFullRelations.java constructs gtfsIdOsmStopMap (osm stops) which shouldn't empty as the stops.osm file is not empty. Still looking for that "This tool generates relations without checking if there are existing ones on OSM at the moment." (which should be based on GTFS only not?) :)

reldiff/reldiffx:

I'll look into this crash and i'll let you know

For the fullrels CLI option closes "normally":

This is expected. Basically, the gtfsIdOsmStopMap map checks for the GTFS stop ID tag on the OSM nodes, in your area no OSM stops have a corresponding GTFS stop ID tag. So you have to first upload the new stops created with the stops command to OSM as suggested, then you can execute the fullrels command to generate relations.

Still looking for that "This tool generates relations without checking if there are existing ones on OSM at the moment." (which should be based on GTFS only not?) :)

I meant that the "sync" behavior to check the differences between GTFS and OSM data is only implemented for stop nodes at the moment. The tool still doesn't have a system to check differences between GTFS routes and OSM relations.

OK, thanks.

This is expected. Basically, the gtfsIdOsmStopMap map checks for the GTFS stop ID tag on the OSM nodes, in your area no OSM stops have a corresponding GTFS stop ID tag. So you have to first upload the new stops created with the stops command to OSM as suggested, then you can execute the fullrels command to generate relations.

And that is what I would like to avoid (putting the stops or routes into OSM) - for 2 reasons:

  • copyright issues: many if not all of the GTFS files are issued by an operator which technically holds the copyright on it, and most of them aren't free to share; a lot of the operators do have their own app using this data with the GTFS as a side product. There is no way of indicating this copyright into OSM (you can't copy from a map or other digital sources OSM states clearly). Not to mention that a lot have agreements with G**gle which is also using this GTFS data.
  • OSM doesn't really support timings on routes: the only fields you can put some data inside OSM is through the "interval" and "opening hours" tags AFAIK.
    and a third reason (maybe not in europe, but for sure in a lot of parts in the world)
  • there's a lot of "rubbish" concerning the public transport in OSM - PTv1, PTv2, half finished routes, non-existing routes (somebody playing), ... you name it. Correcting all of this would take ages. Maybe not for a small set like the one in Sibiu Romania which I took as a test case, but take for example Singapore or Jakarta - there are up to 10000 stops and hundreds of routes to verify/correct/clean up.

My point of using the gtfs-osm-import was to get osm xml files from the GTFS, which with a little manipulation (osm ids), could be merged to an OSM xml cleaned of the current public transport (using osmfilter) from Geofabrik. The stops I have now using your tool. The routes are missing. If I fork this repository, could I get to the following relatively easy:

  • take the GTFS routes.txt, map match
  • get the ways/relations from OSM for the map matched route (pfaedle/C++ doesn't do this, it only gives a json file for the route, or I am bad in reengineering their code), but you are using graphhopper - does it give the ways/relations of a gpx path?
  • bring it back to an osm xml file, even if the osm ids are not present.
  • fill it with dummy osm ids (both stops and routes)

Use the merged osm map in OsmAnd - you get the public transport offline.

Thanks for your thoughts on this.

Okay so, the reldiff and reldiffx crash is fixed, but as those commands are very old and weren't coded by me probably will be completely removed or refactored sometime in the future. I'm putting them on hold for now.

copyright issues: many if not all of the GTFS files are issued by an operator which technically holds the copyright on it, and most of them aren't free to share; a lot of the operators do have their own app using this data with the GTFS as a side product. There is no way of indicating this copyright into OSM (you can't copy from a map or other digital sources OSM states clearly). Not to mention that a lot have agreements with G**gle which is also using this GTFS data.

Oh, alright, not every operator licenses GTFS data under a very permissive license.

OSM doesn't really support timings on routes: the only fields you can put some data inside OSM is through the "interval" and "opening hours" tags AFAIK.
and a third reason (maybe not in europe, but for sure in a lot of parts in the world)
there's a lot of "rubbish" concerning the public transport in OSM - PTv1, PTv2, half finished routes, non-existing routes (somebody playing), ... you name it. Correcting all of this would take ages. Maybe not for a small set like the one in Sibiu Romania which I took as a test case, but take for example Singapore or Jakarta - there are up to 10000 stops and hundreds of routes to verify/correct/clean up.

Yeah, alright, I understand your point.
So your main goal is to use GTFS data offline just for yourself.

...but you are using graphhopper - does it give the ways/relations of a gpx path?

Yes, with GraphHopper I can create relations that include OSM ways (with their real IDs) and the OSM stops.

Okay so, right now, to accomplish what you want to do using my tool you can follow these steps:

  1. run the stops command to generate the diff XML files of the new stops
  2. open the generated XML files in JOSM to review them
  3. mark all the stops of the gtfs_import_not_matched_stops.osm file to be deleted in JOSM
  4. merge the XML files together into a single XML file
  5. open the stops.xml file located in cache/osmdata
  6. merge the stops.xml file with the previous merged xml file that has the new stops
  7. make sure the stops.xml file with the new stops is always located in the cache/osmdata folder
  8. use the command fullrels -s to generate the way-matched relations. (the -s option disables the downloading of new OSM stops data)
  9. you'll have the gtfs_import_mergedFullRelations.osm with all the new relations
  10. then you can merge the gtfs_import_mergedFullRelations.osm and the previous stops.xml together with your OSM city data.

This is the only way to do this at the moment, I will try to simplify this process for those who want to use the data offline like you.

Let me know if everything works.

Thanks for the suggestions. I tried them and this is the result (still on the same gtfs file, and with the release 1.1.0.

  1. run the stops command to generate the diff XML files of the new stops

I get 2 new files: gtfs_import_new_stops_from_gtfs.osm (contains 302 stops) and gtfs_import_not_matched_stops.osm (contains 27 stops) as also indicated by the tool.

  1. open the generated XML files in JOSM to review them
  2. mark all the stops of the gtfs_import_not_matched_stops.osm file to be deleted in JOSM
  3. merge the XML files together into a single XML file

Do I understand this correctly: 2-4 means merge in JOSM all stops of gtfs_import_new_stops_from_gtfs.osm and gtfs_import_not_matched_stops.osm together in a new file, ie stops_new.osm?

  1. open the stops.xml file located in cache/osmdata
  2. merge the stops.xml file with the previous merged xml file that has the new stops

Merge stops_new.xml from previous step with stops.osm from cache/osmdata in JOSM to new file stops_new2.osm

  1. make sure the stops.xml file with the new stops is always located in the cache/osmdata folder
  2. use the command fullrels -s to generate the way-matched relations. (the -s option disables the downloading of new OSM stops data)

I get a crash running java -Xms512m -Xmx1g -jar gtfs-osm-import.jar fullrels -s

Skipping OSM Stop node ID 482809718 (ref=null, gtfs_id=null) as its operator tag value (CFR) is different than the one specified in the properties file.
Skipping OSM Stop node ID 2667585928 (ref=null, gtfs_id=null) as its operator tag value (CFR) is different than the one specified in the properties file.
Skipping OSM Stop node ID 2667589992 (ref=null, gtfs_id=null) as its operator tag value (CFR) is different than the one specified in the properties file.
Skipping OSM Stop node ID 10291103204 (ref=null, gtfs_id=null) as its operator tag value (CFR) is different than the one specified in the properties file.
Skipping OSM Stop node ID 10846590775 (ref=null, gtfs_id=null) as its operator tag value (CFR) is different than the one specified in the properties file.
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 9
at it.osm.gtfs.input.GTFSParser.readRoutes(GTFSParser.java:245)
at it.osm.gtfs.commands.CmdGenerateRoutesFullRelations.call(CmdGenerateRoutesFullRelations.java:71)
at it.osm.gtfs.commands.CmdGenerateRoutesFullRelations.call(CmdGenerateRoutesFullRelations.java:37)
at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
at picocli.CommandLine.execute(CommandLine.java:2170)
at it.osm.gtfs.GTFSOSMImport.main(GTFSOSMImport.java:211)ava -Xms512m -Xmx1g -jar gtfs-osm-import.jar fullrels -s

  1. you'll have the gtfs_import_mergedFullRelations.osm with all the new relations
  2. then you can merge the gtfs_import_mergedFullRelations.osm and the previous stops.xml together with your OSM city data.

I have attached the merged files stops_new.osm and stops_new2.osm (the last one I rename to stops.osn and put it in the cache/osmdata directory) if it would be of any help.
stops_new2.osm.txt
stops new.osm.txt

Alright, I forgot to release the new version with the fix for that exception, I'm sorry ๐Ÿ˜…
Try using the new 1.2.0 version and let me know.

Do I understand this correctly: 2-4 means merge in JOSM all stops of gtfs_import_new_stops_from_gtfs.osm and gtfs_import_not_matched_stops.osm together in a new file, ie stops_new.osm?

Yes, you're right.

Merge stops_new.xml from previous step with stops.osm from cache/osmdata in JOSM to new file stops_new2.osm

Yes, but then you have to rename the stops_new2.osm file back to stops.osm, and put it back into the cache/osmdata folder again. So that this file can be used by the tool again as if it was downloaded from OSM.

OK, no problem.
These are the results: started clean with version 1.2.0 and ran java -Xms512m -Xmx1g -jar gtfs-osm-import.jar stops
The output now gives three files:

gtfs_import_matched_with_updated_metadata.osm, (53)
gtfs_import_new_stops_from_gtfs.osm, and (249)
gtfs_import_not_matched_stops.osm (18)

So I guess there is now some matching of the bus stops in the GTFS.
I tried to repeat the commands with only gtfs_import_new_stops_from_gtfs.osm, gtfs_import_not_matched_stops.osm and stops.osm merged together (java -Xms512m -Xmx1g -jar gtfs-osm-import.jar fullrels -s) and the tool stops without parsing any relations. Bump.

Then I merged all of the three xml files from the stops process (gtfs_import_matched_with_updated_metadata.osm, gtfs_import_new_stops_from_gtfs.osm and gtfs_import_not_matched_stops.osm) with the stops.osm, and placed it into the cache/osmdata as stops.osm: running now java -Xms512m -Xmx1g -jar gtfs-osm-import.jar fullrels -s starts the relation parsing (great!)
It goes on for a while (there are 4137 trips in this gtfs, but eventually crashes just after the third last trip (number 4135). This is the output:

Creating full way-matched relation for trip SEMBRAZ tripId = 8_81600_4_1_2 ...
Matches: 120, GPS entries:429
GPX length: 9066.99 vs 9021.83
GPS import took: 0.001011017 s, match took: 0.10040118 s
Creating full way-matched relation for trip SEMBRAZ tripId = 8_81600_5_1_2 ...
Matches: 120, GPS entries:429
GPX length: 9066.99 vs 9021.83
GPS import took: 9.82317E-4 s, match took: 0.10066956 s
Creating full way-matched relation for trip SEMBRAZ tripId = 8_83160_3_1_3 ...
Matches: 120, GPS entries:429
GPX length: 9066.99 vs 9021.83
GPS import took: 9.57826E-4 s, match took: 0.10087461 s
[main] INFO org.java.plugin.registry.xml.ManifestParser - got SAX parser factory - org.apache.xerces.jaxp.SAXParserFactoryImpl@4b5debf7
[main] INFO org.java.plugin.registry.xml.PluginRegistryImpl - configured, stopOnError=false, isValidating=true
[main] INFO org.java.plugin.registry.xml.PluginRegistryImpl - plug-in and fragment descriptors registered - 1
[main] INFO org.java.plugin.standard.StandardPluginManager - plug-in started - org.openstreetmap.osmosis.core.plugin.Core@0.48.0.0-21-gd264b8c0-SNAPSHOT
[501,107s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
[501,107s][warning][os,thread] Failed to start the native thread for java.lang.Thread "Thread-m211"
Exception in thread "main" java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:802)
at org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager.execute(ActiveTaskManager.java:61)
at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.execute(Pipeline.java:126)
at it.osm.gtfs.utils.OsmosisUtils.runOsmosisMerge(OsmosisUtils.java:52)
at it.osm.gtfs.commands.CmdGenerateRoutesFullRelations.call(CmdGenerateRoutesFullRelations.java:192)
at it.osm.gtfs.commands.CmdGenerateRoutesFullRelations.call(CmdGenerateRoutesFullRelations.java:37)
at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
at picocli.CommandLine.execute(CommandLine.java:2170)
at it.osm.gtfs.GTFSOSMImport.main(GTFSOSMImport.java:211)

I guess this is an out of memory error, so I ran it again with a bit more memory allocated (max 2G instead of 1G): java -Xms512m -Xmx2g -jar gtfs-osm-import.jar fullrels -s (on a new terminal as all my VM was full): no avail- same error, still crashes after the third last trip number 4135, trip id 8_83160_3_1_3)

I looked into the cache/fullrelations directory and there are 4137 files, meaning all of them have been parsed. Any idea why it goes wrong here as increasing the allocated memory doesn't help?

Suggestion: I had a look inside other gtfs files (the ones from Jakarta, Singapore), and the current particular gtfs file has a lot of trips. The reason is that many of the trips for the used tursib.gtfs_2023-06-v1.zip have the same shape_id (maybe an implementation error of the operator making this gtfs, as the ). Maybe a suggestion for improvement would be to do the graphhopper search only over the unique shape_id in the trips.txt. If the purpose of using the trips.txt file is only to mapmatch the unique routes, it would be better getting them out of the shapes.txt unless that file would be empty or non-existent

So I guess there is now some matching of the bus stops in the GTFS.

Yes, I updated the matching system to support name-matching also for bus stops also.

Then I merged all of the three xml files from the stops process...

Correct.


Hmm, regarding the crash, it is probably caused by the final merge of all the relations the tool has created. I tried to reproduce this and in fact the tool took almost 2,5 GB of ram at the last stage of the process.

The only way to solve this is to increase the allocated memory to at least 3G for the last step of this command, at the moment.

...have the same shape_id (maybe an implementation error of the operator making this gtfs, as the )

Yes, probably this is an error made by the operator.

Maybe a suggestion for improvement would be to do the graphhopper search only over the unique shape_id in the trips.txt. If the purpose of using the trips.txt file is only to mapmatch the unique routes, it would be better getting them out of the shapes.txt unless that file would be empty or non-existent

Thanks a lot for the suggestion, I noticed it too. Will look into this today.

Allocated up to 3.5G and it still fails (same error).

java -XshowSettings:vm
Picked up _JAVA_OPTIONS: -Xmx3648m
VM settings:
Max. Heap Size (Estimated): 3.56G
Using VM: OpenJDK 64-Bit Server VM

Going to reduce a bit the trips.txt file in size and see if this solves something.

Hmm, that's weird.

Going to reduce a bit the trips.txt file in size and see if this solves something.

Okay, let me know how it goes

Reduced version (127 unique trips) gets through. Let me have a look now at the files in JOSM.

link for the reduced trip set is here

Regarding the suggestion:

Maybe a suggestion for improvement would be to do the graphhopper search only over the unique shape_id in the trips.txt. If the purpose of using the trips.txt file is only to mapmatch the unique routes, it would be better getting them out of the shapes.txt unless that file would be empty or non-existent

Actually the tool already has a system to check whether trips are the same (by checking the shape_id, also) or not.

The problem is in your operator's GTFS data: in fact, in the stop_times.txt file, the stop_sequence should start from 1 again for every trip, just like in Turin's GTFS data. But this doesn't happen in your GTFS data, the stop_sequence value keeps going up for every stop time.

I will probably fix this by removing the stop_sequence check, but this isn't ideal.

one small problem: in the gtfs_import_mergedFullRelations.osm file the order of the id of the relations is from large to small (absolute values), opposite of the gtfs_import_new_stops_from_gtfs.osm file (where the ids in absolute values are ordered from small to large).
Merging files (osmium, osmconvert) expects small to large.

Any reason why the relations.osm file is different in ordering than the stops.osm file(s)?

Merging files (osmium, osmconvert) expects small to large.

Really?

Actually the gtfs_import_new_stops_from_gtfs.osm nodes ID is their GTFS ID but negative.

Any reason why the relations.osm file is different in ordering than the stops.osm file(s)?

It's because the numbering of the relations starts from the number 10000, then goes up for every next relation. The first relation generated in the file will be obviously the last one in the file, and the last generated relation will be the first one in the file.

You could use osmium to sort the data. I think there is a command called sort.

OK, in principle it works - now for completeness the following steps are needed:

  • general note: from what I have tested osmium has difficulties merging osm files with negative ids to existing osm map files. I am using the following version

osmium --version
osmium version 1.14.0
libosmium version 2.18.0
Supported PBF compression types: none zlib lz4

  • first you need to make the "id"= and "ref"= positive numbers - manual rename
  • run osmium sort on the result files before merging (osmium sort file.osm -o newfile.osm)
  • merge with osmium merge (osmium merge file1.osm file2.osm -o newfile.osm)

With that I will close this issue as completed. Merging of (positive id and sorted) gtfs_import_mergedFullRelations.osm and stops.osm with the original map.osm looks to work fine as input for osmandmapcreator