ical-org/ical.net

`VEVENT` duration not handled correctly in case of a DST change between `DTSTART` and `DTEND`

Closed this issue · 22 comments

The following test fails, because there is a DST change between DTSTART and DTEND, which is not handled correctly:

    [Test]
    public void TestGetOccurrence_DtEndExcluded()
    {
        var ical = """
            BEGIN:VCALENDAR
            VERSION:2.0
            PRODID:Video Wall
            BEGIN:VEVENT
            UID:VW6
            DTSTAMP:20240630T000000Z
            DTSTART;TZID=Europe/London:20241001T000000
            DTEND;TZID=Europe/London:20241128T000000
            SUMMARY:New home speech.mp4
            COMMENT:New location announcement; may need update before Thanksgiving
            END:VEVENT
            END:VCALENDAR
            """;

        var calendar = Calendar.Load(ical);
        var occurrences = calendar.GetOccurrences(new CalDateTime("20241128T000000", "Europe/London"), new CalDateTime("20250101T000000", "Europe/London")).ToList();

        Assert.That(occurrences, Is.Empty);
    }

Related issues may appear in cases with recurring events where some of the occurrences do have DST changes between DTSTART and DTEND and others don't.

The following specification from RFC 5545 3.8.5.3 is also related to the topic, because it makes a distinction between the duration being specified via DTEND or via DURATION. Also related to #598

If the duration of the recurring component is specified with the
"DTEND" or "DUE" property, then the same exact duration will apply
to all the members of the generated recurrence set. Else, if the
duration of the recurring component is specified with the
"DURATION" property, then the same nominal duration will apply to
all the members of the generated recurrence set and the exact
duration of each recurrence instance will depend on its specific
start time. For example, recurrence instances of a nominal
duration of one day will have an exact duration of more or less
than 24 hours on a day where a time zone shift occurs. The
duration of a specific recurrence may be modified in an exception
component or simply by using an "RDATE" property of PERIOD value
type.

This is a follow-up of #656. While #656 discusses the case that DTSTART/DTEND are specified in floating time, this issue handles the slightly different case that DTSTART/DTEND specify a time zone.

@axunonb I'm surprised to see that #658 will solve this issue. Does this mean that the problem is related to local timezone, even if an explicit timezone reference is specified? Or is there a specific fix for that case?

No, it's not about local time, but floating time:.
The start is a floating a date-only:

var occurrences = calendar.GetOccurrences(new CalDateTime("20241128")).ToList();

A floating time can be converted to a zoned time,
The Value remains the same, only the timezone is set differently.
ToTimeZone is called from GetOccurrences.

public IDateTime ToTimeZone(string otherTzId)
{
    // That makes the difference:
    if (IsFloating) return new CalDateTime(_dateOnly, _timeOnly, otherTzId);

    var zonedOriginal = DateUtil.ToZonedDateTimeLeniently(Value, TzId);
    var converted = zonedOriginal.WithZone(DateUtil.GetZone(otherTzId));

    return converted.Zone == DateTimeZone.Utc
        ? new CalDateTime(converted.ToDateTimeUtc(), UtcTzId)
        : new CalDateTime(converted.ToDateTimeUnspecified(), otherTzId);
}

Does this make sense?

EDIT: Okay for here the value is date-only and remains floating. To convert, it would need a time.

I apologize for causing confusion, by updating the test case in the description multiple times. My assumption regarding #656 was, that it would be fixed with #658, but that it would be likely to persist, if timezones would be specified explicitly. So I intended to create a test, with no floating time involved, so we can analyze this independently. However, in the first version of the description the test case did contain floating time.

Given the updated test case without floating time involved, will it still be fixed by #658?

The test method is now updated as follows, and it fails with one event on CalendarEvent (10/01/2024 00:00:00 +01:00 Europe/London), while it should not according RFC 5545 Section 3.6.1,

The case is explicitly mentioned here:
https://github.com/axunonb/ical.net/blob/9870716d38690eec3123510554429df3cc0734fd/Ical.Net/Evaluation/RecurrenceUtil.cs?plain=1#L48C1-L51C1

Looks like something's wrong the endTime in the where section. Couldn't find out why yet.

[Edit]

[Test]
public void GetOccurrenceShouldExcludeDtEndZoned()
{
    var ical = """
       BEGIN:VCALENDAR
       VERSION:2.0
       PRODID:Video Wall
       BEGIN:VEVENT
       UID:VW6
       DTSTAMP:20240630T000000Z
       DTSTART;TZID=Europe/London;VALUE=DATETIME:20241001T000000
       DTEND;TZID=Europe/London;VALUE=DATETIME:20241128T000000
       SUMMARY:New home speech.mp4
       COMMENT:New location announcement; may need update before Thanksgiving
       END:VEVENT
       END:VCALENDAR
       """;

    var calendar = Calendar.Load(ical);
    var occurrences = calendar.GetOccurrences(new CalDateTime("20241128T000000", "Europe/London"), new CalDateTime("20250101T000000", "Europe/London")).ToList();

    Assert.That(occurrences, Is.Empty); // fails
}

I assume it's not the 'where', it's that dtend is considered to be at 20241128T010000, which is caused by faulty calculations.

Omg, just noted, that the test was still wrong, due to the wrong value type. Fixed again, but probably doesn't make any difference.

Omg, I should have noticed this. But yes, it still fails because the faulty dtend (not the whole where clause)

In RecurringEvaluator we have

public override HashSet<Period> Evaluate(IDateTime referenceDate, DateTime periodStart, DateTime periodEnd, bool includeReferenceDateInResults)
{
    // ...
    if (includeReferenceDateInResults && (Recurrable.RecurrenceRules == null || !Recurrable.RecurrenceRules.Any()))
    {
        rruleOccurrences.UnionWith(new[] { new Period(referenceDate), });
    }
    // ...
}

where includeReferenceDateInResults is true, and the start date gets included, while it should be false for this test.
From DTSTART and DTEND we have have a change in DST.

Looks like there is no test for for start date <= event period end in `RecurringEvaluator.Evalute()

Why do we get 1 occurrence instead of none:

public override HashSet<Period> Evaluate(IDateTime referenceTime, DateTime periodStart, DateTime periodEnd, bool includeReferenceDateInResults)
{
// Evaluate recurrences normally
var periods = base.Evaluate(referenceTime, periodStart, periodEnd, includeReferenceDateInResults);
foreach (var period in periods)
{
period.Duration = CalendarEvent.GetFirstDuration();
period.EndTime = period.Duration == default
? period.StartTime
: period.StartTime.Add(CalendarEvent.GetFirstDuration());
}
// Ensure each period has a duration
foreach (var period in periods.Where(p => p.EndTime == null))
{
period.Duration = CalendarEvent.GetFirstDuration();
period.EndTime = period.Duration == default
? period.StartTime
: period.StartTime.Add(CalendarEvent.GetFirstDuration());
}
return periods;
}
}

  1. HashSet<Period> periods initially do not have the period.EndTime set
  2. The first foreach loop determines the period.EndTime using the duration. 11/28/2024 00:00:00 +00:00 Europe/London minus 10/01/2024 00:00:00 +01:00 Europe/London = 58.01:00:00 (58 days and 1 hour)
  3. Adding this duration to the period.StartTilme makes period.EndTime 1 hour later than CalendarEvent.DtEnd and also 1 hour later than the from parameter of calendar.GetOccurrences(...).
  4. As the from parameter date/time is before period.EndTime, it is returned as an occurrence.

@minichma Looks like the cause is identified - not sure how a fix could look like.

@minichma If we subtract the difference in zoned time (aka Value) instead of .AsUtc this test (and all others we have) succeed.
It does make a certain sense to me, but Is it a viable solution, when DTSTART and DTEND are in different timezones? I guess, in such a case a DURATION must exist...

public virtual TimeSpan GetFirstDuration()
{
    if (Properties.ContainsKey("DURATION"))
        return Duration;

    if (DtStart is not null)
    {
        if (DtEnd is not null)
        {
            // before (subtracts AsUtc): return DtEnd.Subtract(DtStart);
            return DtEnd.Value - DtStart.Value; // new
        }
        else if (!DtStart.HasTime)
        {
            // For cases where a "VEVENT" calendar component
            // specifies a "DTSTART" property with a DATE value type but no
            // "DTEND" nor "DURATION" property, the event's duration is taken to
            // be one day.
            return TimeSpan.FromDays(1);
        }
        else
        {
            // For cases where a "VEVENT" calendar component
            // specifies a "DTSTART" property with a DATE-TIME value type but no
            // "DTEND" property, the event ends on the same calendar date and
            // time of day specified by the "DTSTART" property.
            return TimeSpan.Zero;
        }
    }

    // This is an illegal state. We return zero for compatibility reasons.
    return TimeSpan.Zero;
}

Proposal for CalendarEvent.GetFirstDuration()

public virtual TimeSpan GetFirstDuration()
{
    if (Properties.ContainsKey("DURATION"))
        return Duration;

    if (DtStart is { HasTime: false } && DtEnd is null)
    {
        // For cases where a "VEVENT" calendar component
        // specifies a "DTSTART" property with a DATE value type but no
        // "DTEND" nor "DURATION" property, the event's duration is taken to
        // be one day.
        return TimeSpan.FromDays(1);
    }

    // we can't calculate a duration
    if (DtStart == null || DtEnd == null) return default;

    // For NodaTime, we need a time zone to calculate the duration
    if (DtStart.IsFloating || DtEnd.IsFloating)
    {
        return DtEnd.Value - DtStart.Value;
    }

    // Handle time zones and DST transitions.
    var startZoned = DateUtil.ToZonedDateTimeLeniently(DtStart.Value, DtStart.TzId);
    var endZoned = DateUtil.ToZonedDateTimeLeniently(DtEnd.Value, DtEnd.TzId);

    // Nominal Duration:
    // This is the duration calculated based on the local date and time values, ignoring time
    // zones and DST transitions. It represents the difference in calendar time.
    var duration = NodaTime.Period.Between(startZoned.LocalDateTime, endZoned.LocalDateTime, PeriodUnits.Seconds).ToDuration();

    // Actual Duration:
    // This is the duration calculated based on the Instant values,
    // which represent a specific point in time in UTC. It takes into account the time
    // zones and DST transitions, providing the actual elapsed time between the two points
    // in time in UTC.
    // It takes into account the time zones and DST transitions.
    // var duration = endZoned.ToInstant() - startZoned.ToInstant();

    return duration.ToTimeSpan();
}

Yeah, that's a horrible topic. And its why TimeSpan is not well suited for this purpose at all, because it cannot distinguish between the different parts of the time and not between nominal and exact durations (see the excerpt from the RFC in the description above), but that's a issue throughout the lib. The method GetFirstDuration() is somewhat unclear too. We should have two methods, GetFirstDurationNominal and GetFirstDurationExact instead and decide which one to use where. And given the fact that the days and time part need to be treated differently, we'd even need different return types than Duration.

I will try to comment in more detail the next days.

I think there are multiple issues involved here. The most basic one is that the current implementation of CalDateTime results in (t + d - t != d) if a DST change is involved.

    public static IEnumerable<TestCaseData> TestSomethingTestCases { get; } = [
        new TestCaseData(new CalDateTime(2024, 10, 27, 0, 0, 0, tzId: null), TimeSpan.FromHours(4)),
        new TestCaseData(new CalDateTime(2024, 10, 27, 0, 0, 0, tzId: CalDateTime.UtcTzId), TimeSpan.FromHours(4)),
        new TestCaseData(new CalDateTime(2024, 10, 27, 0, 0, 0, tzId: "Europe/Paris"), TimeSpan.FromHours(4)),
    ];

    [Test, TestCaseSource(nameof(TestSomethingTestCases))]
    public void TestSomething(CalDateTime t, TimeSpan d)
    {
        Assert.Multiple(() =>
        {
            Assert.That(t.Add(d).Subtract(d), Is.EqualTo(t));
            Assert.That(t.Add(d).Subtract(t), Is.EqualTo(d));
        });
    }

My first thought would be that we should probably remove all the arithmetic operator overloads, because they are ambiguous, and replace the Add* and Subtract* with these methods:

  • AddNominal(TimeSpan)
  • AddExact(TimeSpan)
  • SubtractNominal(TimeSpan)
  • SubtractExact(TimeSpan)
  • SubtractNominal(IDateTime)
  • SubtractExact(IDateTime)

where the 'Nominal' variants can only be used if both values have the same tzId set.

The `GetFirstDuration´ should probably be replaced by something like

    public struct EventDuration(TimeSpan nominalPart, TimeSpan exactPart)
    {
        public TimeSpan NominalPart => nominalPart;
        public TimeSpan ExactPart => exactPart;
    }

    public EventDuration GetEffectiveDuration() {
    ...
    }

The GetEffectiveDuration method could then also account for the different handling required by the RFC if the duration is specified via DURATION vs via DTEND. But all in all would certainly require some additional considerations.

(t + d - t != d) if a DST change is involved

Great finding

remove all the arithmetic operator overloads from CalDateTime

Agree

Let's track arithmetic operators with #670

'Nominal' operation variants can only be used if both values have the same tzId set.

Isn't this VEVENT definition also valid and Nominal should be used in case of GetFirstDuration?

DTSTART;TZID=America/New_York;VALUE=DATETIME:20241001T000000
DTEND;TZID=Europe/London;VALUE=DATETIME:20241128T000000

'Nominal' operation variants can only be used if both values have the same tzId set.

Isn't this VEVENT definition also valid and Nominal should be used in case of GetFirstDuration?

DTSTART;TZID=America/New_York;VALUE=DATETIME:20241001T000000
DTEND;TZID=Europe/London;VALUE=DATETIME:20241128T000000

Yes, this is valid, but to my understanding no, this is not supposed to use nominal time, but exact time according to the RFC

If the duration of the recurring component is specified with the
"DTEND" or "DUE" property, then the same exact duration will apply
to all the members of the generated recurrence set.

This is somewhat counterintuitive if the TZ is the same for DTSTART and DTEND, but if the TZ is different like in your example, then there is no meaningful way of calculating a nominal duration, so this probably is the reason for exact duration being specified in this case. GetFirstDuration is not well defined regarding whether it returns an exact or a nominal duration, which should be improved. When introducing it, I basically just preserved the previous behavior but this should certainly be improved, either by having two variants, one for nominal, one for exact time, or by returning something that can express both (see the drafted EventDuration above).

no, this is not supposed to use nominal time, but exact time according to the RFC

This was what I not assumed, before looking around what others are doing.
The duration is always 5 hours

PHP

/* Run online with https://onlinephp.io/ */
<?php
$start = new DateTime('2024-10-01 00:00:00', new DateTimeZone('America/New_York'));

$occurrences = [];
$interval = new DateInterval('P1D'); // Assuming daily occurrences

for ($i = 0; $i < 60; $i++) { // Check for 60 days to cover the DST change
    $occurrence_start = clone $start;
    $occurrence_start->add(new DateInterval('P' . $i . 'D'));
    $occurrence_end = clone $occurrence_start;
    $occurrence_end->setTimezone(new DateTimeZone('Europe/London'));
    $occurrences[] = [
        'start' => $occurrence_start->format('Y-m-d H:i:s T'),
        'end' => $occurrence_end->format('Y-m-d H:i:s T')
    ];
}

echo "Occurrences around DST change:\n";
foreach ($occurrences as $occurrence) {
    echo "Start: " . $occurrence['start'] . " - End: " . $occurrence['end'] . "\n";
}
?>

Output:

Occurrences around DST change:
Start: 2024-10-04 00:00:00 EDT - End: 2024-10-04 05:00:00 BST
Start: 2024-10-05 00:00:00 EDT - End: 2024-10-05 05:00:00 BST
...
Start: 2024-10-27 00:00:00 EDT - End: 2024-10-27 04:00:00 GMT  # DST ends in Europe
Start: 2024-10-28 00:00:00 EDT - End: 2024-10-28 04:00:00 GMT
...
Start: 2024-11-03 00:00:00 EST - End: 2024-11-03 05:00:00 GMT  # DST ends in the US
Start: 2024-11-04 00:00:00 EST - End: 2024-11-04 05:00:00 GMT

Python:

from datetime import datetime, timedelta
import pytz

# Define start time with time zone
start = datetime(2024, 10, 1, 0, 0, 0, tzinfo=pytz.timezone('America/New_York'))

occurrences = []
interval = timedelta(days=1)  # Assuming daily occurrences

for i in range(60):  # Check for 60 days to cover the DST change
    occurrence_start = start + i * interval
    occurrence_end = occurrence_start.astimezone(pytz.timezone('Europe/London'))
    occurrences.append({
        'start': occurrence_start.astimezone(pytz.timezone('America/New_York')).strftime('%Y-%m-%d %H:%M:%S %Z%z'),
        'end': occurrence_end.strftime('%Y-%m-%d %H:%M:%S %Z%z')
    })

print("Occurrences around DST change:")
for occurrence in occurrences:
    print(f"Start: {occurrence['start']} - End: {occurrence['end']}")

Output:

Occurrences around DST change:
Start: 2024-10-09 00:00:00 EDT-0400 - End: 2024-10-09 05:00:00 BST+0100
Start: 2024-10-10 00:00:00 EDT-0400 - End: 2024-10-10 05:00:00 BST+0100
...
Start: 2024-10-27 00:00:00 EDT-0400 - End: 2024-10-27 04:00:00 GMT+0000  # DST ends in Europe
Start: 2024-10-28 00:00:00 EDT-0400 - End: 2024-10-28 04:00:00 GMT+0000
...
Start: 2024-11-03 00:00:00 EST-0500 - End: 2024-11-03 05:00:00 GMT+0000  # DST ends in the US
Start: 2024-11-04 00:00:00 EST-0500 - End: 2024-11-04 05:00:00 GMT+0000
...

@minichma Forgive me for dealing with “nominal” and “exact” once again, and being verbose.

Nominal vs accurate in terms of duration

RFC 5545 3.3.6 says: "The format can represent nominal durations (weeks and days) and accurate durations (hours, minutes, and seconds)." So it is just description of the duration, that is accurate or nominal. dur-day and dur-week are inaccurate descriptions and often found in colloquial language.

So when we have

BEGIN:VEVENT
UID:uid1@example.com
DTSTAMP:20240331T120000Z
DTSTART;TZID=Europe/Berlin:20240328T010000
DURATION:P7W
RRULE:FREQ=DAILY;COUNT=5
SUMMARY:Weekly Reminder
END:VEVENT

although the duration is expressed nominal (in RFC terms), it still ends up in accurate occurrence durations that consider the actual start and end times, including any DST changes:

  • Start: 2024-03-28T01:00:00 (Europe/Berlin, UTC+01:00)
  • End: 2024-05-16T01:00:00 (Europe/Berlin, UTC+02:00) - Accurate duration includes DST change.

The method GetFirstDuration() should use the difference End - Start as the result (we should change the method name), because the applicable timezone (including DST changes) is set in the next step when determining occurrences.
We have such a helper method because of RFC 5545 3.6.1 "The "DTSTART" property for a "VEVENT" specifies the inclusive start of the event. For recurring events, it also specifies the very first instance in the recurrence set. (...)"

Currently, the problem comes from

return DtEnd.Subtract(DtStart);

that calculates the real-world, accurate duration, while is should just use the pure date/time difference. That's the cause of wrong occurrences.

This being said, my conclusion also was, that CalDateTime arithmetic operations should always calculate the real-world, accurate duration (as they do now, except for the bug tracked with #670).

It could well be that we have talked past each other so far because of the word "duration" having different meanings in section 3.6.1 and GetFirstDuration() and CalDateTimearithmetic operations.
Does all this make sense to you?

Does all this make sense to you?

Yes absolutely. Probably I was somewhat unclear and we are also mixing two related but different topics.

  1. The core of the problem described in this ticket is what is tracked in #670. It should be fixed by making sure that all CalDateTime.Subtract and .Add variants do their calculations the same way, i.e. either in exact time (i.e. after converting to UTC) or in nominal time (i.e. calculating in the given timezone without any tz conversion). Whether we use exact or nominal doesn't matter in case of a single-date event like the one in the issue's description (except for ambiguous or non-existing timestamps).

  2. The other problem that we started discussing is the topic around how to deal with nominal vs exact in the case of recurring events. There it makes a difference whether calculations are done in UTC or in local time, if some recurrences span DST changes while others don't. That's not the problem described in this ticket though, so it should probably be tracked independently. The first step of dealing with it was taken in #574.