lucmoreau/ProvToolbox

Timezone information lost during deserialization

mf-16 opened this issue · 9 comments

mf-16 commented

I encountered an issue with ProvToolbox version 2.0.0 while working with time. When deserializing a document using the provided code:

var inf = new InteropFramework();
var document = inf.readDocumentFromFile("file.provn");

file.provn:

document
activity(prov:a,2023-09-08T20:12:45.109-04:00,2023-09-15T20:35:06.793-04:00)
endDocument

After Deserialization and serializing it back to provn:

document
activity(prov:a,2023-09-09T00:12:45.109+02:00,2023-09-16T00:35:06.793+02:00)
endDocument

we lose information about timezone, and the timezone we get now is systems timezone.

This issue seems to occur in the ProvFactory class, specifically in the newISOTime method where the timezone information is lost during the execution of:

public XMLGregorianCalendar newISOTime(String time) {
        return this.newTime(DatatypeConverter.parseDateTime(time).getTime());
}

more specifically here:

DatatypeConverter.parseDateTime(time).getTime()

This issue impacts applications relying on accurate timezone data and could lead to incorrect data representation.

2023-09-08T20:12:45.109-04:00 and 2023-09-09T00:12:45.109+02:00 denote the same time, but they are expressed according to different time zones.

PROV does not specify a Document's “default timezone” according to which dates have to be serialized (unlike namespace prefix which can be defined in a Document)

I am not aware of an obligation set by PROV to reexport dates in the same timezones as those they were imported in.

Please reopen the issue, if the above interpretation is not correct.

stain commented

If it can't preserve the tz, then it should perhaps normalize to UTC (Z) not into the locale timezone of the environment that provconvert is running, otherwise the provenance of the prov conversion becomes important as well..

The above commit is quick fix for ProvToolbox, offering a new factory method to create dates, and keep their original timezone offsets, instead of converting to the default system timezone offset.

@mf-16 does it address your concern?

The online translator, however, has not changed. If you paste the following example in https://openprovenance.org/service/translator.html (selecting provn notation), the result will display the same provenance but with both dates expressed with the default timezone offset (London time, at this time of the year, GMT+1). Same in provconvert from the command line.

document
prefix ex <https://example.org/>
activity(ex:a,2023-09-08T20:12:45.109-04:00,2023-10-15T20:35:06.793-02:00)
endDocument

Following @stain 's suggestion, there is now a constructor to create the date in normalized form (In UTC timezone). PROV-N parser was updated to support it. The other serializations json/jsonld/provx seem to normalize dates in UTC.

Now, the following (on my development branch) reads provn and exports with dates in UTC.

curl https://gist.githubusercontent.com/lucmoreau/588fdaeca5eb271cc6d0cd86816bea00/raw/ff621ab4bbb07d20585331be2c434e4bd575a8c8/date_with_tz_offset.provn | modules-executable/toolbox/target/appassembler/bin/provconvert -infile - -informat provn -outfile - -outformat provn

For provtoolbox to preserve the original timezone offset when run on the command line requires a bit more effort.

mf-16 commented

The above commit is quick fix for ProvToolbox, offering a new factory method to create dates, and keep their original timezone offsets, instead of converting to the default system timezone offset.

@mf-16 does it address your concern?

Yes, it does. I appreciate you providing fix for the issue and your quick response, @lucmoreau. Thank you!

@stain and @mf-16: your comments gave me food for thought. provconvert now allows users to specific how timezone offset is to be processed, PRESERVE, UTC, SYSTEM or a specific timezone. A web client can also specific this to the provapi, by means of headers.