gdcc/xoai

Bug: protocol spec violation with date ranges

Closed this issue · 0 comments

Currently, OAICompileRequest.validateDates() checks if from is after until. This is a clear protocol violation!

2.7.1 Selective Harvesting and Datestamps

Harvesters may use datestamps to harvest only those records that were created, deleted, or modified within a specified date range. To specify datestamp-based selective harvesting, datestamps are included as values of the optional arguments, from and until, in the ListRecords and ListIdentifiers requests. Harvesting is restricted to the range specified by the from and until arguments, extending back to the earliest datestamp if from is omitted, and forward to the most recent datestamp if until is omitted. Range limits are inclusive: from specifies a bound that must be interpreted as "greater than or equal to", until specifies a bound that must be interpreted as "less than or equal to". Therefore, the from argument must be less than or equal to the until argument. Otherwise, a repository must issue a badArgument error.

Repositories must support selective harvesting with the from and until arguments expressed at day granularity. Optional support for seconds granularity is indicated in the response to the Identify request. The value of datestamps in both requests and responses must comply to the specifications for UTCdatetime in this document. A repository must update the datestamp of a record if a change occurs, the result of which would be a change to the metadata part of the XML-encoding of the record. Such changes include, but are not limited to, changes to the metadata of the record, changes to the metadata format of the record, introduction of a new metadata format, termination of support for a metadata format, etc.

This is especially a problem with day granularity.

It also does not check for conformance with using the same granularity for both dates: (it does by comparing the string length...)

3.3.1 UTCdatetime in Protocol Requests

Datestamps used as values of the optional arguments from and until in the ListIdentifiers and ListRecords requests are encoded using ISO8601 and are expressed in UTC. These arguments are used to specify datestamp-based selective harvesting. These arguments support the "Complete date" and the "Complete date plus hours, minutes and seconds" granularities defined in ISO8601. The legitimate formats are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. Both arguments must have the same granularity. All repositories must support YYYY-MM-DD. A repository that supports YYYY-MM-DDThh:mm:ssZ should indicate so in the Identify response. A request by a harvester with finer granularity than that supported by a repository must produce an error.

The timestamps are also only validated if both from and until are present, which is incorrect (see first quote above). It's debatable if the until should default to now(), as no future dates are possible. At least both from and until should be checked to not be within the future.

Also, the request MUST use the granularity given within the Configuration and complain about others. Plus, the configuration contains an "earliest date", which from may not surpass

  • Check from against earliest date from Config
  • Check from and until is not in the future
  • Check from is before or equal to until
  • Check from and until use the granularity given in Configuration
  • Optional: check from and until use the same granularity (already done)
  • Make sure until is atEndOfDay() when using "day" granularity
  • Default until to now/this day
  • Ensure the resumption token sourced from and until does not circumvent this (it does now, because loading happens after validation)
  • Investigate into remove the DateProvider.parse(Sring) method to enforce using the configured granularity everywhere