openlibraryenvironment/gokb

OAI-PMH: deliver dates without time data where time is not necessary

Closed this issue · 2 comments

Non-system-side set date fields have time data which is irrelevant, e.g. a coverage start time of a journal is meaningless. As no XML schema is linked to the OAI-PMH metadata, it is completely on our behalf how we output date data. At the moment, we deliver them as xs:dateTime formats with time zone, i.e. YYYY-MM-DDTHH:MM:SSZ. Asaforesaid, there is no rule which would prevent us to switch to xs:date which omits time data. We thus should output date fields where time data is meaningless only YYYY-MM-DD in order to avoid conflicts with misstored time zones.

I would go further and strongly suggest that we should also use a Date format internally, and not a DateTime format. Even in the PostgreSQL database we should use the 4 byte type Date. So we can prevent possible further errors like timezone conversions that switch dates like coverageStart "2020-01-01T00:00:00" to "2019-12-31T23:00:00" that would result in the wrong day.

Best would be to store internally all acceptable KBART date formats like "2020", "2020-01" and "2020-01-01", but I think there are no java classes that support these formats?

For "2020", there is the data type "Year", but this is a completely other data type. There is also the class GregorianCalendar to mention. But none of them resolves really the question how to store them in the database. And beware that LAS:eR and every other pulling system needs to be aware of that!