Unexpected result using `P1Y` with two timestamps not being in a one-year interval

Question

Unexpected result using `P1Y` with two timestamps not being in a one-year interval

bonaparten opened this issue 3 years ago · 7 comments

Bug Description

The data aggregation contribution endpoints give an "unexpected" result if the timestamps are aggregated by year (P1Y) and if the two given timestamps have not a one-year interval.

Given these 3 different values of the time parameter:

2014-01-01/2017-01-01 gives an expected result
2014-01-01/2017-06-01/P1Y gives the same result as case 1
2014-01-01/2017-06-01 gives an expected result

Expected Behaviour

IMHO case 2 should give the same result as case 3 but aggregating the result by year with the last interval between 2017-01-01 and 2017-06-01.

I am not sure, whether this is a bug or not. If this behavior is on purpose, it should be discussed whether or not it is a confusing behavior.

General Information

Version of the ohsome API 1.4.0
API instance: instance at https://api.ohsome.org/v1
Affected endpoint(s): /contributions/count (not sure what about the full-history endpoint since it has a similar implementation of the timestamps).
Used HTTP method: GET and POST
Utilized tool/library for the request: Swagger

Further Information

Error Messages, Logs, Screenshots

Result case 1:

      "fromTimestamp": "2014-01-01T00:00:00Z",
      "toTimestamp": "2017-01-01T00:00:00Z",
      "value": 33

Result case 2:

   {
      "fromTimestamp": "2014-01-01T00:00:00Z",
      "toTimestamp": "2015-01-01T00:00:00Z",
      "value": 16
    },
    {
      "fromTimestamp": "2015-01-01T00:00:00Z",
      "toTimestamp": "2016-01-01T00:00:00Z",
      "value": 7
    },
    {
      "fromTimestamp": "2016-01-01T00:00:00Z",
      "toTimestamp": "2017-01-01T00:00:00Z",
      "value": 10
    }

Result case 3:

      "fromTimestamp": "2014-01-01T00:00:00Z",
      "toTimestamp": "2017-06-01T00:00:00Z",
      "value": 36

Answer 1 · 2021-06-14T15:19:01.000Z

I think its not a bug. If it would attach the last interval (only 6 month) it would not comply to the requested yearly intervals and then you might accidentially compare a year with a half year, because of "mixed" data.
I think this behaviour also comes from the snapshot requests (implemented first) where you only have one timestamp in the results and there the last timestamp would also break the time-distance between the timestamps resulting in an irregular time-series.
Is there a request to get "partial" aggregations?

Answer 2 · 2021-06-14T15:34:07.000Z

I think its not a bug. If it would attach the last interval (only 6 month) it would not comply to the requested yearly intervals and then you might accidentially compare a year with a half year, because of "mixed" data.
I think this behaviour also comes from the snapshot requests (implemented first) where you only have one timestamp in the results and there the last timestamp would also break the time-distance between the timestamps resulting in an irregular time-series.
Is there a request to get "partial" aggregations?

I would agree that you can't change the default behaviour. That's my understanding how ISO 8601 defines it and the to be expected behaviour. But maybe a feature like that would be useful through an extra boolean parameter like remainder? You should explicitly tell the ohsome API, “yes, I want data that is not fitting into the interval I chose” to avoid confusions :)

Answer 3 · 2021-06-14T15:39:25.000Z

Agree with what @mcauer said. This is also the expected behaviour in many programming languages when generating periodic data using a start+end+step, e.g. python:

>>> range(1,7,2)
[1, 3, 5]

But maybe a feature like that would be useful through an extra boolean parameter

IMHO that's overkill. People can always fall back to specifying a manually expanded list of timestamps if they really want uneven intervals, e.g. 2014-01-01,2015-01-01,2016-01-01,2017-01-01,2017-06-01.

Answer 4 · 2021-06-14T15:58:26.000Z

OK. Should we add some explanations about this particular case into the documentation? It would help at least the confused users.

Answer 5 · 2021-06-14T16:18:00.000Z

OK. Should we add some explanations about this particular case into the documentation? It would help at least the confused users.

Is something in time.rst not clear enough? Do you have an idea how to formulate a more precise description there?

Answer 6 · 2021-06-14T17:46:04.000Z

I still think P1Y is confusing for these cases. Improvements in the documentation could avoid some confusion. But I don't want to create unnecessary work. So if you guys think it is good as it is, I am OK with it ;-)
I was thinking at something like this:

"note: if you use P1Y with two timestamps not being in a one-year interval, you will get a response with the last result having as toTimestamp value the last date with one-year time-distance from the last fromTimestamp within the given time range (e.g. ...)."

Answer 7 · 2021-06-15T09:09:38.000Z

The proposed note looks fine, but IMHO it should not explicitly mention only P1Y for a case where this can occur, because it does occur for all time intervals if the end time does not align properly. It could be used as an example, like this perhaps:

Note: if you use the time interval syntax where the end time does not perfectly align with the given start and period, you will get a response where the last result has as (to) timestamp value which is the start time plus a multiple of the period, and not after the given end time. For example, if you use 2010-01-01/2012-02-01/P1Y, the actually used final timestamp will be 2012-01-01.