google/transit

GeoJSON in GTFS? (Or the future of GTFS serialisation)

skinkie opened this issue Β· 20 comments

It is currently being proposed in GTFS-Flex (#388) to introduce a new serialisation format we have never worked with in GTFS, or GTFS-RT. In the past these kind of discussions (think zeromq, mqtt, websockets...) were explicitly avoided in favor of proven technology. With the suggestion for a new serialisation format for GTFS-Flex namely: GeoJSON this feels to me as "too fast" especially because we historically have had this discussion about shapes.txt. But also more recently how to serialise more complex structures where CSV (honestly) also does not make sense either.

No, I am not proposing a full overhaul of all the files (or suggesting a well thought out format as NeTEx ;). But I think we can all agree if we bring in anything else than CSV it better be the right solution and apply it to multiple places, not just for GTFS-Flex.

I already have asked @isabelle-dr if we could have some sort of meeting on this topic.

e-lo commented

Thanks for bringing this up – I think this is a good discussion to have on its own and separate from (but with dependency on) GTFS-Flex. Managing and using another format does indeed bring in a challenge that should be carefully considered.

Some thoughts on introducing geojson as formulated in proposed locations.geojson:

  • As I've discussed in other issues, I am not in favor of storing anything other than a foreign key and geometry in locations.geojson.
  • I seem to be in the minority here so am happy to let it go iff....we can easily (think: one line of pandas code) translate the information in locations.geojson into a dataframe (or equiv) format. No nested properties.
  • geojson does seem to be the easiest format for many transit agencies to create/maintain/review geometry in, given tools like geojson.io and compatibility with other tools.

I do think shapes would likely benefit from having a natively viewable format as well. I'd be curious about the previous discussion points and why this wasn't pursued in the end.

I think this discussion needs some historical context as to why the switch was made from expressing polygons using WKT strings in a .csv file to using GeoJSON, given by someone who was involved in the Flex (v2) drafting process.

@westontrillium Is there anyone that you think can provide that historical context? Maybe someone we can invite to a live meeting?

e-lo commented

The only convo I could find on WKT in the GTFS-Flex repo shows a great deal of support for it >> GeoJson. Will try and find where decision was made in other direction.

MobilityData/gtfs-flex#5

...somebody with edit access to the gtfs-flex google doc might be able to search the comments and version history for more.

I can provide limited context here, but I think it's most of the context needed here to explain that decision. I project managed applications that used both wkt and geojson, but am not a developer and can't speak directly to the technical limitations of either approach, but I can speak to the business reasons and reports I received from technical parties at the time.

  • the comments from the flex repo discussion above represent the final decision making just before the implementation of GTFS-Flex v1 (ie, wkt-based) in OpenTripPlanner. I believe the resulting code should still be more or less intact in OTP v1.4 and 1.5. Geojson came up at the time, but it was still in the middle of its astronomical rise to geographic-representation-dominance, not established as ubiquitous, and we didn't even really consider going outside the csv. Previously, gtfs-flex in it's very original conception had included a custom geographical representation, and using wkt represented a move to adopting another open standard rather than creating our own.
  • wkt worked 'fine' in the v1-based trip planner. I think issues came up about parsing it but they weren't the major issues in the development process.
  • the idea of using a geojson file specifically was proposed by MobilityData I believe in 2018. I'll admit at first I was skeptical. However, two things convinced me personally over the next year that it was the superior choice: 1) the developers didn't complain about it, and generally reported it was as easy to work with as wkt in a csv, 2) there were huge benefits for human readability and editability of the files, thanks to the ubiquity of free geojson visualizers and editors. That discussion was on a previous Google doc and in virtual meetings.

I think all the above is important and it's why i came to support geojson in GTFS-flex. It's also why I, from a business and community angle, don't see anything wrong with us bringing in other file types besides .csvs, if they're the right technical tool for the job.

But I think @skinkie raises a different valid and important question that we should hear from technical parties on , and @eliasmbd 's call to the conversation at #127 feels particularly relevant to me although it's way above my head. Even if this is right for business reasons, what are the technical implications for existing systems and the future of the technical options available to or required of the spec?

One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.

One question we should ask: are there other options we should seriously consider besides geojson and wkt in a csv? If there are no other serious contenders, that at least might simplify this discussion.

@tsherlockcraig I second this. I think our primary course of action is to evaluate the viable options (Pros/Cons). What we do here will define the future possibilities of GTFS and I applaud @skinkie for bringing this up when he did.

At this point we are working on a timeframe for a meeting. Before then, I would like to invite you all to share this issue to the relevant people in the community, new and old. It is important that everyone that should see this issue does before we engage in a virtual meeting. Internally, we are working on an appropriate stakeholder outreach as well.

As someone that maintains OTP's Flex implementation I only have a one mundane comment on this topic: Flex is a significant departure from GTFS static. It's difficult to implement but that difficulty doesn't stem from it being CSV, wkt or geojson - that's the easy part. It's the huge variability and the explosion of possible results that flex adds that is the real complexity not the choice of geographic representation.

That said I also welcome there to be a discussion so that whatever decision we end up making is a deliberate one rather one that everybody thought someone already made.

Allow me to respond to the side comment about Netex: what I like about GTFS is that it's hugely pragmatic rather than a giant standard where every country has their own "profile" because anything else is too large to manage.

So I take a well done GTFS feed over a "more elegant" but terribly implemented Netex feed any day. It's much, much easier to achieve "well done" with GTFS . The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?

The majority of producers struggle with GTFS, so what hope is there of them producing good Netex ones?

A proper free desktop implementation that manages data as a producer and uses NeTEx as its internal model, not conversion on conversion ;-)

✏️ We have a few dates we would like to propose for the virtual meeting. Please fill out this form to find a good time for everyone.

Fill out form

As we prepare for the meeting, we have asked you to share your expectations with us. In order to help us scope this meeting, we will post your expectations here.

Preferably, the vision 'beyond' GeoJSON. What should be done when we change significant parts of the standard. For example CSV to XML, CVS to JSON, Protocol Buffers to CBOR.

I think we should leave this meeting having answered whether there are options other than geojson and wkt that need to be researched. At very least, we identify a qualified group to make that determination and begin research. Important that we have technical stakeholders in this meeting. Needs of business stakeholders (like myself) should be de-prioritized in my opinion.

hopefully a focused discussion, not a general formats flame war as usual on the internet 😬

As you can see, expectations are diverse. From my end, I would propose to maintain the focus on GeoJSON and the alternate solutions out there. I kept the expectations anonymous but invite you all to participate helping us keep the scope focused and precise.

Also, it can be expected that we will host the meeting on Tuesday 8 August at 11am EDT. More details will follow.

πŸ“£ We have an event registration page. Please sign up and share to all relevant parties. As expected the meeting will be held on Tuesday August 8th @ 11am EDT. (Sharpen your Miro pencils πŸ˜‰ )

Since @eliasmbd has prompted us to give our thoughts on this before the session tomorrow, here are mine:

The quick version is that I'm not immediately opposed to adding GeoJSON to the GTFS spec.

I ultimately come back to the GTFS Guiding Principles, which is make GTFS easy to produce and edit. My intuition here is that there will be many data producers out there who are managing their geographic assets, including service region polygons for Flex, in standard GIS applications. And for many of the most popular GIS applications, GeoJSON is a well-supported export format that could just work off-the-shelf.

I think a similar case can be made for CSV + WKT, but I think the tooling isn't quite as seamless.

Why didn't GTFS consider GeoJSON for something likes shapes.txt originally? If I understand my GeoJSON history correctly, GeoJSON has only really been a thing since 2007 and only an RFC since 2016 (GTFS being born in 2006). Might history have been different if GTFS has come slightly later? I do not know.

What other data formats might we consider? Conceivably, you might look at anything on the GDAL-supported Vector format list but I think there are only a handful of formats that are simple enough, have reasonable governance, and have been around long enough for consideration. I don't think it's an accident that GeoJSON is at the top of that list.

I recognize that producing GTFS (and GTFS-Flex especially) has gotten complex enough that it may not be reasonable to support the simple use-case of a transit operator typing up data in a spreadsheet and we may have expectations that some sort of GTFS export application will be in use, in which case some of these arguments around facility of creation carry less weight. That said, I do think there is something to be said for being able to quickly visualize data in a feed and GeoJSON does have some advantages there.

Anyways, looking to hear from other folks tomorrow. Thanks!

πŸ™ Thank you for joining us for the strategic meeting held yesterday. It was an eye opening and refreshing discussion for many of us.

πŸ“ Here are some takeaways from the meeting:

  • Most participant seemed interested in adding a new format to GTFS

    • MobilityData will reach out to stakeholders representing agencies/users with limited technical capacity and resources to solidify the interest and make sure that we are aware of possible implications for them.
  • We noticed a consensus was building around the specific geometries that the community wanted to target - zones and shapes.

    • Stops were a more controversial subject
  • Many participants showed support for the GeoJSON format but some voiced the options of using GPKG

    • MobilityData will research GPKG and then compare it to GeoJSON
  • MobilityData will provide the community with a few options considering the implications of adopting a new format and recommendations.

πŸ—“οΈ Once the points above have been resolved, MobilityData will announce a follow up meeting - expect the meeting to be held sometime in September.

πŸ“£ MobilityData would like to invite you to review and comment our findings on the inclusion of GeoJSON within GTFS.

We have included the stakeholder outreach findings, the comparative analysis between GeoJSON and GPKG, as well as 2 options to consider and a suggestion.

πŸ‘€ TL;DR

  • MobilityData has reached out to stakeholders representing agencies/users with limited technical capacity and resources to make sure that we are aware of possible implications for them.
  • Based on the feedback, we propose introducing GeoJSON as the format for expressing vector geospatial data in GTFS, especially for polygon and linestring geometry.
  • We propose addressing locations.geojson (polygons) first and addressing shapes.geojson (linestring/route shapes) afterward.

Here is the document link

❗ MobilityData will consider the volume and quality of comments, revise the documentation if necessary and then call a meeting in the subsequent week (27 september 2023 @ 11AM EDT if consensus is maintained)

Folks, I have left my comment in the document - but you need to be aware of existing work and the planned roadmap of the OGC (Open Geospatial Consortium). especially the Special Working Group on Routing
opengeospatial/ogcapi-routes#58

drewda commented

The @interline-io team agrees with the "tldr" bullet points posted by @eliasmbd and with the overall substance of the document.

In case it's useful to others, here are the detailed comments we shared earlier in support:

  • We strongly agree that GeoJSON makes the most sense as the format for expressing any new vector geospatial data to be added to the GTFS spec. GeoJSON works with a wide range of tooling, as you know. It's expressed in text (unlike other recent options like GeoPackage).

  • GeoJSON does have some performance limitations that are a problem in other areas. (On Interline's website you'll find some blog posts about how we sometimes produce and consume "GeoJSONL" as an alternative, for example with a lot of OpenStreetMap data.) But individual GTFS feeds are rarely going to hit the limitations of GeoJSON. So we think GeoJSON is fine for use within individual GTFS feeds.

  • Interline has hosted trip planners running against the GTFS-Flex v1 and GTFS-Flex v2 specifications. We defer to our partners at Trillium to create the flex feeds, but have often had to debug issues in flex feeds when they aren't ingested properly or produce the expected trip plans. Flex can be hard to debug. The switch from WKT to GeoJSON for expressing geometries did make it easier to debug any issues that involve geometries. It's somewhat easier to open up a GeoJSON file than it is to read in WKT from a column in a CSV file. This is a reason why we'd support sticking with GeoJSON rather than reverting back to WKT for flex.

  • We do like the idea of switching from shapes.txt to a GeoJSON representation, but think this complicates the current question. From our perspective, it makes the most sense to move ahead with adopting GTFS-Flex v2 with a GeoJSON file. Doing more things with GeoJSON files in GTFS feeds would be nice and we would support those changes, but feels like it complicates the adoption of flex right now.

  • In our experience the audiences for modeling fixed-route transit in GTFS and for modeling demand-responsive transit in GTFS-Flex are almost completely different. The good news is that additions for flex probably won't complicate matters for traditional fixed-route transit agencies -- they can ignore the additions. The bad news is that the audience for flex/DRT has, on average, much less technical capability than fixed-route transit agencies. This isn't exactly an argument for GeoJSON, but we're just sharing this observation

  • When it's time to try adopting GeoJSON as an alternative for shapes.txt, we think this will be a net positive for all transit agencies. It's hard to think of a situation where editing points as rows in a spreadsheet would be easier or more accessible than editing polyline features in a GeoJSON file. It would be nice for this to be adopted alongside a formal approach to versioning of the spec -- still, we think that it could work to conditionally require shapes.txt or a "shapes.geojson" file. Just as with separating the adoption of flex from the adoption of shapes in GeoJSON, we overall assume it would be simplest for the GTFS community to make incremental steps, rather than have a number of steps all bundled together with blocking dependencies on each other.

  • Finally, the one challenge about using GeoJSON will be needing to carefully limit the type of features that can be used in each GeoJSON file, and also the properties that can be attached to each feature. There can be a number of different ways to express the same geometries in GeoJSON (for example as Features in a FeatureCollection or in a single MultiPoint) It'll probably be best to use a really simple and limited schema. It might also be useful to eventually end up with separate GeoJSON files -- like one for flex areas, and a complete separate GeoJSON file as the shapes.txt alternative for fixed-route alignments. The simpler approaches will make it easier to use basic editing tools like geojson.io We've already seen related conversations about the need to keep the schema tight and simple on GitHub, so we trust that this is already a known issue and will get figured out.

πŸ“― We have a date for our next GeoJSON in GTFS meeting! - September 27th 2023 @ 11AM EDT πŸ“―

Sign up for the event here

πŸ™ Pο»Ώlease review and leave a comment in this document before attending this meeting.
Tο»Ώhis meeting will cover the points highlighted in the document, confirm consensus around the option and propose a path forward for GeoJSON in GTFS.

πŸ“” Pο»Ώlease let us know if you would like to propose and present an alternative during the meeting, we can reserve a few minutes for you.

Dο»Ώisclaimer: This is not a GTFS-Flex working group meeting

While developing GTFS in 2005/2006, we discussed the usage of an ESRI shapefile for pattern geometries. At the time, it was perceived to be the most widely used GIS format. Ultimately, the decision was made to stick with CSV and sequence numbers to allow for easier adoption by others without a GIS.

I would prefer that shapes.txt and stops.txt are left untouched and not deprecated in any way. As a producer, I would hate to produce 2 versions of shapes and stops in our GTFS to make sure I don't break any of our consumer applications. If there is a need for GeoJSON versions of shapes.txt or stops.txt, open source tools could be developed to convert from specific GIS formats to GTFS and vice versa. This would also allow GTFS producers to maintain in whatever format works for them without a backward-incompatible change to the spec.

I understand the need for something more sophisticated when dealing with multi-part polygons in GTFS-Flex but for points and lines, the argument seems weak and creates more work for everyone involved. I would vote for the tried-and-true OGC WKT format in CSVs for expressing polygons but I'd be a +0 for GeoJSON.

As @drewda said, if GeoJSON is adopted by GTFS-Flex, there needs to be specific documentation about what features are acceptable, what projections are allowed e.g. EPSG 4326, etc., to simplify consumer software.