MobilityData/gbfs

manifest.json: include region

futuretap opened this issue · 14 comments

What is the proposal?

The introduction of manifest.json greatly facilitates the distribution of feed changes. It would be even more useful if a course geographic specification of the feed area would be specified for each feed. This would allow consumers to select relevant feeds according to the location of the user.

The geographic area could be specified by an array of lat/lon rectangles (areas in the example). In most cases, one rectangle per feed should suffice. Multiple rectangles could be useful for disjunct operating areas in a single feed.

Optionally, specifying an ISO 3166-1 alpha-2 country_code could help selecting relevant feeds per country.

Example:

"data":{
  "datasets":[
    {
      "system_id":"flamingo_wellington",
      "versions":[
        {
          "version":"2.3",
          "url":"https://data.rideflamingo.com/gbfs/wellington/gbfs.json",
        },
        {
          "version":"3.0-RC",
          "url":"https://data.rideflamingo.com/gbfs/3/wellington/gbfs.json",
        }
      ],
      "areas": [
        {
          "north": -41.20,
          "east": 174.86,
          "south": -41.36,
          "west": 174.71
        }
      ],
      "country_code": "nz"
    }
    …
  ]
}

Alternative solutions

In theory, this information could also be retrieved by fetching geofencing_zones.json (if existing). However, the rule mechanism makes it relatively hard to determine if a feed is "relevant" for a given coordinate and it requires to fetch 2 feeds (gbfs.json and geofencing_zones.json) per tested region. The idea really is to specify only as much information on the manifest level as needed to filter relevant feeds. This would help reduce unnecessary fetching of sub-feeds.

Is your potential solution a breaking change?

  • Yes
  • No, since manifest.json is new in 3.0.
  • Unsure

Which files are affected by this change?

manifest.json

I have a few concerns with this idea:

  1. There was extensive discussion about the geofence rule objects being introduced. In short - for many users of GBFS there is no "area" so to speak. You could rent a car in France and drive to China if the fleet's rules permit it. To the extent that an area should be defined, it is likely for the purposes of the ride_start_allowed rule. This brings me to my second point:
  2. There are different classes of feed consumer. The area for a MaaS app would be any area where the vehicle can be used, while some mapping applications will only care about the start position of the vehicles.
  3. Combining 1 & 2, the relevant area to display depends on your use case, so many consumers are pulling two feeds anyway. Now we're duplicating one type of information in a separate file.
  4. It's perfectly valid for a single system to have a large number of non-contiguous geofences that allow renting / returning and the whole world as the travel area. How does this get represented in the aforementioned array?

My hunch is that the vast majority of apps pulling GBFS data are concerned with showing the nearest vehicles to you as quickly as possible. The manifest, geofences, and other feeds are more or less static (and the spec explicitly recognizes this with the idea of near-realtime feeds and other feeds). So I don't think it's a big ask to pull two feeds initially.

Thanks @benwedge for your feedback. You’re right with raising the question what "area" actually means. I suggest using the maximum area where the system operates. So for your France/China example it would be all of Europe/Asia/Africa or even the whole world (and a country_code would not be specified since the region is multi-country), which could easily be specified in a single area rectangle. This is totally fine imo. However, in my experience, the overwhelming number of systems operate on a city level.

As for 4., non-contiguous areas could be specified in multiple area rectangles. Of course we could also specify areas in a GeoJSON way like in geofencing_zones.json. My idea was to have a very simple syntax in order to facilitate adoption. In my experience, adoption rate of geofencing_zones.json is low right now.

Depending on the number of feeds a provider offers, the saving in fetches could be considerable. There are systems with hundreds of feeds and we need 2 additional fetches for each feed. It’s very wasteful imo to run 200+ fetch requests just to select one feed at the end. Even more so when that happens on a mobile device. And not even speaking about rate limits that we may run into.

I compare the current concept of manifest.json with a browser bookmark list where only the URL is listed. This is better than nothing but it can be way more useful if a minimal set of meta information is added: Page names (or folders) in the bookmarks case, region information in the GBFS case.

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.

After a follow-up from Ortwin (@futuretap), I'm tagging recent contributors to get your opinion: @tdelmas, @mplsmitch, @testower, @cmonagle, @simonsolnes, @PierrickP, @ezmckinn, @AntoineAugusti, @ArashMansouri, @jkurzanski.
Thank you!

This seems like a reasonable extension to me, in principle. My main question is: would it cause a problem to have multiple areas overlapping? For example, Superpedestrian has some fleets which adjacent operational zones. If you were to draw a rectangle around the borders of these, they would overlap. If the default behavior is to have all feeds within that zone be read into the platform, this probably would not cause a problem. But there's a risk that a platform would only, e.g., read in the first feed that matches a particular area. This could be easily resolved on the part of whoever is ingesting the feed, I think, so I don't think that's a blocker to implementation. Seems straightforward enough to implement from the producer side. So, fine with me.

Overlapping areas should be allowed and all matching areas should be read by consumers.

I'm in favor if this - In the initial discussion that resulted in the manifest endpoint I included geographic information. What I proposed was overly complicated so in the end I left it out. I also like the idea of using a centroid which would look like:

  "location": [
        {
          "lat": -41.283668
          "lon": 174.771423
        }
      ]

That would avoid the issue of overlapping areas

The rectangle has the advantage that for any user location, the available systems can be filtered by using trivial lat/lon comparisons. A centroid without some radius or lat/lon delta isn't clear in this regard. Also, some systems cover whole countries where a centroid would be misleading.

I don't see how anything other than a polygon can be anything other than misleading. How do you get a rectangle to cover a country without including other countries?

Polygons are more exact indeed, rectangles are quicker to filter. Indeed, the area could be too large, so we could end up with too many systems. However, I proposed to also add the country code (if the system doesn't cover multiple countries), so this could be another filter criterion. In practice, most systems cover a city or a country. For such use cases, we should be perfectly fine.

Here @ Fluctuo we would be strongly in favor of that proposition, if the area would be defined as a MultiPolygon

Both, a rectangle array and a MultiPolygon would be possible and fit our needs. The rectangles are more compact and faster to compare, the multi polygons can be more exact and use the established GeoJSON syntax already used in the geofencing_zones feed. I'm open to both but would favor the rectangles.

Thanks for your feedback. I created the PR. I went with MultiPolygons since they were clearly preferred in the comments.

Closing since the PR has been open. All discussion should now be had on #572