inukshuk/edtf.js

Parse error on trailing zeros

mielvds opened this issue · 9 comments

Hi there,

Thanks for this great library!
I was trying to parse 2020-05-18T22:39:24.422000Z, but it throws an error because it doesn't expect the trailing zeros. However, I think this should be valid? at least for new Date() it is.

Cheers,

Miel

I don't have the ISO 8601 standard by hand, but I've only ever seen strings with three-digit milliseconds. The parser is supposed to accept only valid ISO date strings so I think that throwing an error in the example above is the expected result. The JS Date constructor allows some non-ISO formats which this parser has to reject.

Experimenting with this a little bit: (new Date('2024-01-10T14:17:32.1234Z')).toISOString() gives '2024-01-10T14:17:32.123Z' so the JS implementation also rounds the ISO format to three digits. .getMilliseconds() also is 123 so it's not only that the ISO format drops the number but that the parser ignores it silently.

Yep, you're absolutely right! I guess you could consider adding a 'non-strict' mode, but I would also understand if you wouldn't ;)

Right, we sort of hijacked the extension levels for this already. There is a level 3 (the standard only goes to level 2) where we added some extra features. We could make the parser accept more sub-second precision there. However, even if the parser accepted the extra precision the rest of the API is built on top of the standard Date object for storage which doesn't store the extra digits. That is, if you used anything but zeroes that information would be parsed and lost. Therefore, to support this properly I think we'd have to store the milliseconds separately, which I think we should only do if there's a strong reason for it.

Where do you get these date strings from?

Where do you get these date strings from?

A Media Asset Management system that does not take standards compliance seriously, as there are many unfortunately. However, we also get a lot of xsd:dateTime and its unclear whether the XSD spec disallows this.

This is from the ISO 8601:2004 spec btw (I don't have access, but I got it from https://stackoverflow.com/questions/25842840/representing-fraction-of-second-with-iso-86012004):

4.2.2.4 Representations with decimal fraction

The interchange parties, dependent upon the application, shall agree the number of digits in the decimal fraction. The format shall be [hhmmss,ss], [hhmm,mm] or [hh,hh] as appropriate (hour minute second, hour minute, and hour, respectively), with as many digits as necessary following the decimal sign. A decimal fraction shall have at least one digit.

So the three digit limit might be a convention rather than a rule

OK in that case we can probably amend the grammar without doing any harm.

Though, as I said the underlying JS Date will likely just drop any extra digits there.

sure, but then at least they can be parsed. Thanks!

OK 4.6.0 should accept any number of decimal fractions, but only up to three will be used.