noi-techpark/odh-mentor-otp

As Open Data Hub I would like to have a new OTP demo instance, that imports public transport data through NeTEx / SIRI instead GTFS / GTFS-RT

Opened this issue · 44 comments

Tasks agreed on 30.8.

Reference sources for NeTEx and SIRI:

NeTEx
https://web01.sta.bz.it/netex/api/v4/downloadVersion?level=4&agencyCode=IT-ITH1
username = rapuser
password = rappass

SIRI

SIRI ET (XML): https://efa.sta.bz.it/siri-lite/estimated-timetable/xml
SIRI ET (JSON): https://efa.sta.bz.it/siri-lite/estimated-timetable
SIRI SX (XML): https://efa.sta.bz.it/siri-lite/situation-exchange/xml
SIRI SX (JSON): https://efa.sta.bz.it/siri-lite/situation-exchange

Relevant also for @leonardehrenfried

As soon as you have the feeds, can you post the URLs here?

Is there an update? Are the NeTEx feeds available somewhere?

@leonardehrenfried for testing purposes you can start working with this NeTEx file.

GE16614_01_DIVA_apb_ALL_1_20240717011758.xml.zip

@leonardehrenfried as discussed today, please consider this NeTEx export and not the one provided 2 days ago. This is the one provided to the NAP of Italy, compliant with EPIP.

NX-PI_01_it_apb_LINE_apb__20240621.xml.zip

I took a look at this today and I am happy to report that importing the feed is going to be possible (with some caveats).

This is what it looks like:

Screenshot from 2024-07-22 17-54-39

Features currently not implemented by OTP

  • Using any as the version of a NeTEx entity
  • The style of the service links (shapes) that are used by the feed

I spoke to the upstream developers and it's going to be possible to implement those two.

Validation errors

The last file you posted successfully validates against both the NeTEx and EPIP XSDs. Very good!

However, OTP has picked up on quite a few validation errors, some of which are cosmetic but also a few serious ones.

Smaller errors

Suspicious data

  • 2061 service journeys repeat the same stop right after each other, often with the exact same time. This is quite suspicious and indicates a data error.
  • There area 956 ServiceJourneyPatterns that are not referenced by a ServiceJourney. This means that they are not imported in OTP. It has no consequences but also probably indicates an error somewhere in the chain.

Serious errors

956 ServiceJourneys have a different number of stops from the ServiceJourneyPattern. This means that these journeys are not imported into OTP at all. It's the most serious error in the feed. An example of this would be the following error message:

Mismatch in stop points between ServiceJourney and JourneyPattern. ServiceJourney will be skipped. ServiceJourney=it:apb:ServiceJourney:031001T-TI-63-5-43500:sonn:, JourneyPattern= it:apb:ServiceJourneyPattern:03100T.24a5100166:

I would speak to Mentz to ask them how this discrepancy can be explained.

Summary

All in all I am pleasantly surprised how well all of this works given that you're the first organisation that tries to import EPIP into OTP.

What is difficult to say is if there are hidden problems with the feed. The best way to find out is to actually use the data, which we currently do.

We can also discuss using a more structured approach to find errors but for now I'm pretty pleased with the progress.

I have to correct myself about the service links (shapes).

EPIP does structure them a bit differently from the Nordic profile but the problem is that in the latest data feed the ServiceLink elements are not referenced by the StopPointInJourneyPattern. In other words: the shapes are present but not used.

According to the profile specification, the StopPointInJourneyPattern should look like this: https://github.com/5Tsrl/netex-italian-profile/blob/main/Examples/Netex_ITA_1.10_EPIP_with_versioning.xml#L16990

but in the supplied data file they look like this:

 <StopPointInJourneyPattern id="it:apb:StopPointInJourneyPattern:01B06_.24a-1-0071303:" version="1" order="3">
  <ScheduledStopPointRef ref="it:apb:ScheduledStopPoint:it-22021-2167-0-5267:" version="any" />
  <ForAlighting>true</ForAlighting>
  <ForBoarding>true</ForBoarding>
  <RequestStop>false</RequestStop>
  <StopUse>access</StopUse>
</StopPointInJourneyPattern>

Note the element OnwardServiceLinkRef is missing.

@leonardehrenfried thanks for the notification. So at the end is just the field OnwardServiceLinkRef missing? In our data we have the reference with the service link in the structure linksInSequence (field serviceLinkRef). Let's understand if we should change our data according to this...

linksInSequence is how the Nordic profile expects it (but not EPIP, apparently) but that field is also absent the latest data file.

If you have the time, I'm available for a call today. Maybe that is quicker than comment ping-pong.

OK, I need to check in the data I provided. Typically we support this (linksInSequence). Let me deepen first this, then we can decide how to handle this

@leonardehrenfried I check this. For the Italian NAP, we removed the structure linksInSequence since was not supported by the Italian profile, but did not add the alternative way to map the match with the service link. I will ask our team of developers working on our NeTEx export for the Italian NAP to adjust this. At the end we will just put the value in linksInSequence/serviceLinkRef (available in our NeTEx German profile) in pointsInSequence/OnwardServiceLinkRef (to be considered in the NeTEx Italian profile). I will let you know when we have a corrected NeTEx export available for your activities here.

There are also other problems of varying severity (see post above). Do you have any information about that?

@leonardehrenfried not yet, I will give you a feedback on all points.

If you have a stable URL from where to download the regularly updated NeTEx feed I can add it to the OTP test instance.

@dulvui we would like that the testing environment of the OTP back-end (https://otp.opendatahub.testingmachine.eu/) is fed for the public transport data not with GTFS data, but with NeTEx data. So please work with @leonardehrenfried to set up this.

Relevant also for @clezag

@dulvui All I need from you is a permantent URL where I can download the latest version of the NeTEx feed. The rest I can do myself.

@leonardehrenfried this is something I can do. At present these NeTEx file stay on an FTP owned by another organizations, i.e. ftp://ftp01.sta.bz.it/netex/2024/plan/All/ Here you find the daily exports, you should always consider the latest one. Can you tell me if you can access there?

Yes, I can access it. HTTP with a stable URL pointing towards the newest version would be the best but I can work around it with some scripting.

The full path its then this one? ftp01.sta.bz.it/netex/2024/plan/EU_Profil/NX-PI_01_it_apb_LINE_apb__20240807.xml.zip

@leonardehrenfried good! Yes, unfortunately they want to use this FTP system... yes the current one is the one you have indicated. But as said, every day we generate a new export, so you should consider the new one for the import in OTP. So you should read the current day in the file name and consider this for the choice of the file.

Yes, I will compute the file name from the current date.

Do you happen to know if the path 2024 will stay the same or change in 2025?

@leonardehrenfried this will change...

I would like to increase the severity of problem because I noticed it today. Previously I said

No timezone is configured in FrameDefaults which OTP expects to be set like this: https://github.com/entur/profile-examples/blob/272ed7e9f1fe8b60ed1bddefd04c782d35c0917b/netex/network/Line61A.xml#L33-L43

At first I thought this is just cosmetic, but I believe that all times are off by 1 or 2 hours depending on whether it's summer or winter. It would be very good if you could set the time zone in the feed as I suggested in my comment.

And since @dulvui just merged my PR, here we have a fresh OTP instance with NeTEx data: https://tinyurl.com/27f8o653

I noticed another problem with the NeTEx data: I see no bus stops or bus routes in the city of Trento while there are plenty in Merano, Bolzano and Bressanone.

Let me give you an example: Piazza Dante near Trento railway station has several bus stops and they are all called a variation of "Piazza Dante". I would expect at least one of them to be present in the data but I see zero stops called "Piazza Dante".

I just checked and it's the same with the GTFS feed.

Is this expected?

@leonardehrenfried this is correct; the NeTEx is just related to the Province of Bolzano, not the Province of Trento. In the dataset there are some bus stops in other regions, but these are used just for the railway services.

Good to know! I thought this feed covers all of South Tyrol.

Yes, it is. Trento is not in South Tyrol, is in Trentino :-)

How embarrassing - I must read up on the difference between an Italian province and a region again! https://en.wikipedia.org/wiki/Trentino-Alto_Adige/S%C3%BCdtirol

I took a look at this today and I am happy to report that importing the feed is going to be possible (with some caveats).

This is what it looks like:

Screenshot from 2024-07-22 17-54-39

Features currently not implemented by OTP

* Using `any` as the version of a NeTEx entity

* The style of the service links (shapes) that are used by the feed

I spoke to the upstream developers and it's going to be possible to implement those two.

Validation errors

The last file you posted successfully validates against both the NeTEx and EPIP XSDs. Very good!

However, OTP has picked up on quite a few validation errors, some of which are cosmetic but also a few serious ones.

Smaller errors

* The `Line` entities do not have an `Authority` which is required in the Nordic profile, so a dummy one is created. These lines have an `Operator` but that is a separate entity in OTP, which is only available in the Transmodel API. We would have to discuss if the Operator is really what GTFS calls the `Agency` in EPIP.

* No timezone is configured in `FrameDefaults` which OTP expects to be set like this: https://github.com/entur/profile-examples/blob/272ed7e9f1fe8b60ed1bddefd04c782d35c0917b/netex/network/Line61A.xml#L33-L43

Suspicious data

* 2061 service journeys repeat the same stop right after each other, often with the exact same time. This is quite suspicious and indicates a data error.

* There area 956 `ServiceJourneyPatterns` that are not referenced by a `ServiceJourney`. This means that they are not imported in OTP. It has no consequences but also probably indicates an error somewhere in the chain.

Serious errors

956 ServiceJourneys have a different number of stops from the ServiceJourneyPattern. This means that these journeys are not imported into OTP at all. It's the most serious error in the feed. An example of this would be the following error message:

Mismatch in stop points between ServiceJourney and JourneyPattern. ServiceJourney will be skipped. ServiceJourney=it:apb:ServiceJourney:031001T-TI-63-5-43500:sonn:, JourneyPattern= it:apb:ServiceJourneyPattern:03100T.24a5100166:

I would speak to Mentz to ask them how this discrepancy can be explained.

Summary

All in all I am pleasantly surprised how well all of this works given that you're the first organisation that tries to import EPIP into OTP.

What is difficult to say is if there are hidden problems with the feed. The best way to find out is to actually use the data, which we currently do.

We can also discuss using a more structured approach to find errors but for now I'm pretty pleased with the progress.

Regarding all these open points:

  • yes, currently we don't have the organization type "Authority" in the resourceFrame, since this was not strictly requested. We have it however in a new version of the NeTEx export, which has several CompositeFrames, including also parking and sharing mobility static data. You can find it for your interest here (export still under consolidation): https://cloud.opendatahub.com/index.php/s/dHXsK9KsFWdKXPC
  • I am checking the topic timezone, I think it can be added without effort at the beginning of the NeTEx export, as you already indicated
  • Regarding the most critical point, which creates in me many doubts: I don't know if this can be related to the fact that in the export we have different line versions with different validity periods. It could be that these line versions are there, also referenced with a reference ServiceJourneyPattern, but are then not associated trips. Can you provide me a couple of examples so that we can better understand these errors?

Since this issue is getting quite large, I took the liberty to open separate tickets for the NeTEx problems.

@leonardehrenfried yes please I wanted to do the same once the issues are clarified

Here they are: https://github.com/noi-techpark/odh-mentor-otp/issues/created_by/leonardehrenfried

You may want to add a label to give readers a bit of context.

@leonardehrenfried thanks! I have created a new label and labelled the issues. I will give a feedback to you in the next weeks

For SIRI: current end-point is https://efa.sta.bz.it/sirilite (in JSON).

@leonardehrenfried we have finally stable end-points for the NeTEx / SIRI data:

There is now also a SIRI-SX interface (Situation Exchange), implemented according to the German / Swiss profile (VDV-736):

Can you in these days these end-points test and try to integrate them in OTP, especially the SIRI end-point?

For the NeTEx data, there has been some update by 5T in relation to compliance with NeTEx EPIP in the Italian profile, more details on Friday. For sure there is still something to fix in the data we provide...

I can take a look at this in the next few days.

Is SIRI only available as SIRI light, where you download everything at once, or also in the Request/Response flow where you create a subscription and get only the latest updates? Request/response is the only one supported by OTP. However, SIRI light is such a simple protocol that it would also not be very hard to implement it.

@leonardehrenfried we have both. At the moment I have shared you just the SIRI light end-points, but in case we can also go in direction subscription. Maybe for a first attempt wouldn't it be easier to work with SIRI light, as done typically in the Nordics?

The Nordics use request/response.

But they also use SIRI light, or? As said, if you prefer the complex approach with request / response, this can be easily activated

They offer SIRI light but don't actually use it. I guess they have it because it's easy to consume. On a country-level request/response has much better performance because you only need to retrieve the latest updates rather than every update for the entire country every minute (even those, where nothing changed).

I would be fine with either.

If the turnaround on activating request/response is as slow as making SIRI available at all, I think I will be faster adding support for SIRI light to OTP. :)

@leonardehrenfried the activities around importing NeTEx data according to the Italian profile are more and more intense at national level. I had some contacts last week with 5T, also Brede was there. At national level they published a new version of the profile (unfortunately in Italian, see annex) with an annex on which are the specific aspects to be considered in order to ensure a smooth import in OTP v2 (see chapter "Appendice A –NeTEx e OTP v.2+"). I would like to discuss this shorty with you, in order to really consolidate what we need to improve in our NeTEx data and eventually provide additional inputs to these national discussions (e.g. the topic parking). As far as I have been told, the current stable OTP version available on github can fully ensure the import of NeTEx data according to these recommendations - is this something that you can confirm? Let's discuss this today...

241104_Linee guida compilazione NeTEx IT v.4.1.0.pdf

An additional point: we are also in close contact with the team at SBB / SKI+ on various topics. They have also had a look to our NeTEx data, in particular Matthias Guenther and Stefan de Konink, you probably know them, They provided us the following inputs:

TimetabledPassingTime must have an id

It is a bad idea not not give id to TimetabledPassingTime. Especially, when version is added.


                                                                                               <TimetabledPassingTime version="any">

                                                                                                           <StopPointInJourneyPatternRef ref="it:apb:StopPointInJourneyPattern:01B10A.24a-3-0030801:" version="3"/>

                                                                                                           <DepartureTime>09:36:00</DepartureTime>

                                                                                               </TimetabledPassingTime>

Using ScheduldedStopPoints as RoutePointRef is not allowed

ScheduldedStopPoints are no RoutePoints


 

                                                           <routes>

                                                                       <Route id="it:apb:Route:1-110-24a-2-1/H:" version="any">

                                                                                   <LineRef ref="it:apb:Line:01110_.24a:" version="2" />

                                                                                   <DirectionRef ref="it:apb:Direction:H:" version="any" />

                                                                                   <pointsInSequence>

                                                                                               <PointOnRoute id="it:apb:PointOnRoute:1-110-24a-2-1/H_1:" version="any" order="1">

                                                                                                          <RoutePointRef ref="it:apb:ScheduledStopPoint:it-22021-468-2-3086:" version="any" />

                                                                                               </PointOnRoute>

                                                                                               <PointOnRoute id="it:apb:PointOnRoute:1-110-24a-2-1/H_2:" version="any" order="2">

                                                                                                          <RoutePointRef ref="it:apb:ScheduledStopPoint:it-22021-468-3-5106:" version="any" />

                                                                                               </PointOnRoute>

                                                                                               <PointOnRoute id="it:apb:PointOnRoute:1-110-24a-2-1/H_3:" version="any" order="3">

                                                                                                          <RoutePointRef ref="it:apb:ScheduledStopPoint:it-22021-2084-0-5029:" version="any" />

                                                                                               </PointOnRoute>

We will have a look also at this, but probably this is not so relevant for the import in OTP, or?

Next steps defined on 15.11:

  • @leonardehrenfried will focus on integration the NeTEx data from the new web-service provided (see main user story description)
  • @leonardehrenfried will focus on integration the SIRI ET data (XML). Decision to test the SIRI-Lite approach first. Attempt to match the SIRI with the NeTEx data from the journey pattern details, since the IDs are not the same ( :-( )
  • @rcavaliere will work on improving certain aspects of the data provided, starting from the issues in the NeTEx data (timeZone + link reference in journeyPatterns)

@leonardehrenfried as agreed today, let's then try to finalize the SIRI integration work (see comment above), mainly this:
@leonardehrenfried will focus on integration the SIRI ET data (XML). Decision to test the SIRI-Lite approach first. Attempt to match the SIRI with the NeTEx data from the journey pattern details, since the IDs are not the same ( :-( )

  • @rcavaliere will additionally check why we have several records with empty StopRef.