uutils/parse_datetime

Chrono's API is inherently inconsistent with GNU's

Opened this issue · 7 comments

I am creating this issue to start a discussion about how to overcome some roadblocks I discovered while trying to address #3505 over at coreutils.

Issue 1

To get right to it, the implementation currently used by this library is not consistent with GNU's.

Currently that way that parse_datetime parses relative strings is that in runs the string through a regex and creates Duration objects and adds them together. The inconstancy arises when adding months: it simply adds 30 days; GNU does not do this. Under the existing (parse_datetime) API 2022-05-15 +1 month will land you on 2022-06-14 (After you do some refactoring to get a date back not a Duration) while running GNU's date -d'2022-05-15 +1 month lands you on 2022-06-15. Now, we could have our API increment months using Months::new which returns a Duration object but that would change the behavior of the existing API and cause a bunch of tests to fail. Why can't we change the tests? Let me explain.

Issue 2

Chrono's API is inconsistent with GNU

While working to solve #3505 I ran into a subtle bug. Addition and subtraction of Months in Chrono's API is non-commutative. The TL;DR is that chrono does clamping when adding months to their Date APIs. To who understand why this is an issue consider:

#[test]
    fn test_test(){
        let date0 = NaiveDate::from_ymd(2023, 1, 31).add(Months::new(1)).add(Months::new(1));
        let date1 = NaiveDate::from_ymd(2023, 1, 31).add(Months::new(2));
        assert_eq!(date1, date0);
    }

Output:

assertion `left == right` failed
  left: 2023-03-31
 right: 2023-03-28

While nothing is being commuted here, it shows that addition of months cannot be commutative under an implementation with this behavior. GNU does not do this. Consider:

[dylan@fedora parse_datetime]$ date -d'2023-01-31 + 1 month + 1 month'
Fri Mar 31 12:00:00 AM EDT 2023
[dylan@fedora parse_datetime]$ date -d'2023-01-31 + 2 months'
Fri Mar 31 12:00:00 AM EDT 2023

While GNU is consistent, Chrono is not. In fact it is 3 whole days off (If "off" is even the right word. What is "correct" is highly debatable). The addition of months is complicated because what it means to add months is somewhat arbitrary. Even how GNU adds months seems inconsistent sometimes. Adding 30 days vs. 1 month, 60 days vs. 2 months results in different dates that ostensibly don't have any obvious connective logic. This behavior further serves to frustrate the issue under discussion!

Whats the Point?

I appears that with these issues, bringing coreutils's touch and date APIs consistent with GNU's under any arbitrary string will be extremely difficult due to how inherently different chrono and GNU handle the addition and subtraction of months (maybe even other units of time).

Options

As touched on earlier, we could tweak parse_relative_time.rs to use chrono's Months::new instead of Durations::days and accept that it will cause a slight change in behavior and just rewrite the tests (What I was tempted to ask for permission to do).

The other option I considered was having a separate implementation for when you want a date back vs. a Duration but that seems silly. Users would rightfully assume that the separate functions are two sides of the same coin and would be consistent with each other.

The core issue of this discussion is that either option the result would be behavior that is inconsistent with GNU is some cases. It doesn't appear possible to get away from it. Given any arbitrary string what is the correct order of operations to get the correct date (as defined by GNU) and is it possible to do it with chrono under their existing implementation???

For what it is worth my intitial pull request to address #3505 did appear to have the requisite functionality but it is unclear if that is the case in for any arbitrary string. And using that would, again, result in a API behavior change.

These issues don't necessarily have obvious solutions which is why I wanted to bring it to the uutils community.

Yes! Thank you! Issue 1 is exactly what I've been advocating for that we should fix. But it's been a bit messy because parse_datetime started out as only a duration parser, instead of datetime parser, which is what it should be to do that. Note that it never even should have been a duration parser, because, like you noticed, duration is dependent on the date.

Now Issue 2 is something I didn't realize yet, so nice find! I think the first question to answer is what GNU does with adding 1 month. In other words, how does it define addition to be commutative?

A pretty good solution might be to adapt something like that CalendarDuration that is mentioned in the chrono issue you opened.

Now Issue 2 is something I didn't realize yet, so nice find! I think the first question to answer is what GNU does with adding 1 month. In other words, how does it define addition to be commutative?

I agree. I'm going to try and do more sleuthing to see if I can figure out how to predict the date returned by GNU. But one concern I have is that even if we did figure out how GNU is handling addition of durations such as months, it still may be hard to do with the chrono library.

A pretty good solution might be to adapt something like that CalendarDuration that is mentioned in the chrono issue you opened.

I read #1282 and I agree it sounds like this might be what we need.

I'm going to keep this issue open and follow up with my findings. It is still an open question about what to do with parse_relative_time.rs in terms of how to fix it but for now, we need more info before we go "fixing" anything.

In the meantime, if anyone has more to add, let's hear it! I really enjoy working on #3505 and want to get this over the finish line but addressing these issues are a bit above one person.

Sorry for the delay! I wanted to follow up on this issue because I have been looking into how to predict how GNU implements date behind the scenes (without looking at source code of course). Here is what I've found so far.

When adding a month (or months) to a date, GNU will attempt to land you on the same date in the requested month (2022-05-15 + 1 month becomes 2022-06-15). Obviously, this is not always possible. If you start from the 31st day of a month and the next month only has 30 days, GNU will land you on the first of the next month. For example, 2023-08-31 + 1 month becomes 2023-10-01.

But the real key that exposes how they do things under the hood is when dealing with February. If you imagine starting from 2023-01-31 + 1 month you actually land on 2023-03-03, or if it is a leap year you land on 2023-03-02. I have experimented with adding 1 month, several months and I have found what I believe to be the rule:

When incrementing months, GNU will first land you naively in the requested month, (January will map to February for an increment of 1 month), then GNU will attempt to land you on the requested day (February 31st), if that is not possible, it land you on the closest date, in the same month then increment forward the difference.

Example:
Below I have pasted the first 3 months of the 2023 calendar.

       January               February                 March       
Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa   Su Mo Tu We Th Fr Sa
 1  2  3  4  5  6  7             1  2  3  4             1  2  3  4
 8  9 10 11 12 13 14    5  6  7  8  9 10 11    5  6  7  8  9 10 11
15 16 17 18 19 20 21   12 13 14 15 16 17 18   12 13 14 15 16 17 18
22 23 24 25 26 27 28   19 20 21 22 23 24 25   19 20 21 22 23 24 25
29 30 31               26 27 28               26 27 28 29 30 31   

Consider:

date -d'2023-01-31 + 1 month'.
Per the theory, GNU will first calculate the requested month is February. Since February 31st does not exist it will start with Tues February 28th. Now the difference mentioned above will be 3 days. If February did have 31 days, the 28th would be 3 days off. THEN it increments forward 3 days. Adding 3 days to Tues 02-28 lands on Fri 03-03. Doing this on a leap year will land you on 03-02, so the theory appears to hold water.

From what I can tell this "algorithm" predicts GNU output for any arbitrary months:

  1. Initially, ignore the date and increment to the requested calendar month.
  2. Take date into consideration and attempt to land on the requested date in the month calculated in step 1.
  3. If this is possible you are done. If this is not possible, start with the closest day that is still in the month calculated in step 1.
  4. Next, imagine if this month did have the requested date. (Using the example above, this fictional day would be the Friday following Tue February 28th). Note what day it would be.
  5. Return the day noted above but with the real-world date. (Friday 31 February becomes Friday 3 March.)

I haven't done any work yet to implement this algorithm because I wanted to follow up first, but ostensibly this feels very doable. We can use existing chrono APIs to implement the algorithm above and bring us one step close to full GNU interchangeability.

Whoa, great work! I think this makes sense! This is probably exactly what we should do.

Thanks for the feedback. I will get a PR together and we can go from there.

I will get a PR together and we can go from there.
Hi @dcechano, was this done ? Otherwise, I can give it a try :)

I will get a PR together and we can go from there.
Hi @dcechano, was this done ? Otherwise, I can give it a try :)

It's sort of done. There is a PR for it but I needed some merge conflicts to be resolved to be merged. I never got to resolving them because I honestly could not figure out how to resolve them. They were too different and incompatible from what I could see. If you want to take another stab at it go ahead.