ITxPT/DATA4PTTools

xsd validation is too slow

Closed this issue · 7 comments

xsd validation is too slow

Thank you, @skinkie. Can you be more specific? File size, time to complete test, what environment you're on etcetera.

Thank you, @skinkie. Can you be more specific? File size, time to complete test, what environment you're on etcetera.

Several agencies have been checked, the last bunch from Denmark.

thbar commented

I can give a specific example with data file. I have tried to validate a largish file (~222MB unzipped) which can be found at https://transport.data.gouv.fr/datasets/horaires-des-lignes-ter-sncf/?locale=en, section "NeTEx resources". Heads-up: the file is entitled "Export au format CSV" but this is a data import bug that we must fix. Also, the file contains (at time of writing) a single encoding error (ISO-8859-1 instead of UTF-8, see etalab/transport-qualite-des-donnees#4) which you will have to fix manually for now.

I have started running the validator outside of Docker, directly on a recent Mac M1, and the process has been running for 40 minutes, and it isn't finished.

Happy to provide more input if needed!

thbar commented

(Final stat on the case I mentioned: the run took 47 minutes, but on a fairly beefy Mac M1 ; on our production setup, it would likely be much slower)

We are aware of the performance issues but have so far prioritized to get the tool and web interface to a working state, including some extra validation scripts and better documentation. We will look more into how we can improve the performance in coming releases.

It is very useful to get examples of working and not working/slow files, thank you for that.

Fixed in version 0.5.5