ISO-TC211/XML

Implementation of 19103 Date in gco namespace

Opened this issue · 6 comments

In the implementation of ISO 19103 Date in the gco namespace baseTypes 1.2.0 Date is implemented as an union:
<xs:simpleType name="Date_Type"> <xs:union memberTypes="xs:date xs:gYearMonth xs:gYear"/> </xs:simpleType>

This allows a mistake in interpretation and encoding with values like 19990101, which is (a) intended as a date in ISO basic format, but (b) permitted by the above implementation as a year-only date, i.e., the year 19,990,101. This means an invalid date like 19999999 (basic format) is not detected by schema validation because it is a valid year-only date though it was intended as a yyyymmdd date.

I suggest making Date_Type a complex type, a choice of XML elements named according to the XML schema built-in date types. Here is a related example from S-100. (It is for truncated date, which is slightly different in using more XML Schema built-in types, but it should convey the idea.)

DateFormat

The result would be like this:

Application schema: <xs:element name="dateStart" type="S100_TruncatedDate" ...

Dataset: <dateStart><gMonthDay>--06-01</gMonthDay></dateStart>

(I am posting this as an XML issue because I think it is just an implementation change and shouldn't need a change to the ISO 19103 standard, but feel free to move it to StandardsTracker if appropriate.)

Thanks, much appreciated.

I don't like yyyymmdd date formats either. The XML built-in types, which use separators, work fine. yyyy-mm-dd works fine. The problem is that people convert data from non-ISO formats and overlook date format conversion, or come from legacy systems and wrongly try to create date values without separators, and the current "union" type provides no hint that they may be making a mistake.

Rather than making the XML type complex, would it be tidier to validate this with a schematron pattern (requiring the separators)?

This is also an issue that should be flagged / solved at the logical level - in this case, in ISO 19103, but see also the ongoing Ad hoc group on representing time - e.g. by defining specifically there what subset of ISO 8601 is allowed for use in ISO/TC 211 data.

Whilst I do agree with pushing our users towards using separators, we should note that ISO 8601:2004 only allows more than four digits in the year "with mutual agreement" (Clause 3.5), and such extended year values need to start with either + or -.

So anyone interpreting 19990101 as a year only is not following ISO 8601 (or the widely used RFC 3339, which only allows four digit years).

That said, it's possible that ISO 8601-2:2019 changed this - I haven't got a copy.

Coincidentally, BSI have just given me access to ISO 8601-2:2019. That adds another option for years with > 4 digits, allowing a prefix of "Y" - so if someone is stating 19,990,101 CE (AD) in ISO 8602-2:2019, they are allowed to say Y19990101 without any prior arrangement. 19990101 remains unambiguous.

I think schema-validation (i.e., using types in the XML schema, whether built-in or user-defined) is generally better than Schematron rules (when there is a choice between the two), because schema-validation happens earlier in the process. Also, as a practical matter, developers are less prone to skimp schema validation than application of Schematron rule files.

Years with more than 4 digits would be an error all right in an S-100 context, but they're starting with 8-digit data fields (yyyymmdd) and the idea is to trap and signal errors during data conversion or data entry.