Changing name of "averaging_Period" and adding field that truly indicates time resolution
Opened this issue · 4 comments
Suggest changing currently defined 'averaging_Period' to 'reporting_Interval' and adding in true 'averaging_Period'
The motivation for this suggested change:
As I understand it, currently, 'averaging_Period' is the interval of time over which a datapoint on a site is updated, e.g. UB data is updated on an hourly basis on the website. But this is not the same as the temporal resolution of the measurement itself. In a lot of cases, it seems it happens to be the same and sometimes is, but it is not necessarily the same.
Just as context, some examples of the adapters I worked on:
- the Dutch source reports PM10 and PM2.5 as 24 hour rolling averages, the others are hourly average.
- the Australian data is also pretty detailed:
If we don't explicitly know the intervalPeriod
should we just use the reportingPeriod
or leave it blank? If it's blank in most cases, is that something we'd be ok with?
Is reportingPeriod
something that should be stored or derived from the data?
Maybe the thing to do is to have a) a reportingPeriod
parameter and then another one indicating whether we have inferred the value from our initial analysis of the system or we know it explicitly, either from the source page itself or communication with a given agency, and then b) an averagingPeriod
and associated parameter that can do the same.
I think in a lot of cases, we'll likely need to infer both the reportingPeriod
and averagingPeriod
(that's what we've been doing by and large to date, essentially)- and probably we'll be pretty accurate in doing so, but at least a user can see for themselves our inferences vs explicit information (if we go the route that makes it clear whether we inferred data or not, we'll probably need to do it with coordinates
as well?).
I would think we don't want to derive the reportingPeriod
from the data in some sort of continuous manner at least for scientific purposes, as when data drops out, it will look weird/misleading and make the data more difficult to use from that specific standpoint and at least a good chunk of use-cases I can think of. But, as @jflasher pointed out (and @olafveerman did at dinner), I see how this could be used to help test the system for periods when data fall out from a 'system health' standpoint - perhaps that's a separate something called averageReportingPeriod
? I realize this is a bad and confusing for a name it. :)