eclipse-january/january

Support for determining intersection of two ascending datasets

Closed this issue · 9 comments

I continue to explore the application of AxesMetadata to Datasets.

When values are supplied to represent an Axis, I presume the AxesMetadata is the independent axis, and the dataset itself represents the values to be plotted on the dependent axis.

Based on this assumption, I presume that a Dataset supplied to an AxesMetadata instance should be continuously ascending (or descending). This is to allow the Axes data to be used as a lookup to retrieve a value (or a slice) from the subject dataset. I don't think the AxesMetadata implementation current enforces this.

So, it appears down to the consumer to enforce this. That's ok.

Next step: on the presumption that two AxesMetadata datasets are continuously ascending/descending, does January hold functionality to determine the intersecting range? This appears related to the January's concept of Slices - but I can't find anywhere in the API that provides this.

Given these datasets:

a:[0, 10, 20, 30, 40, 50, 60]
b:[42, 45, 56, 65, 70]

I'd expect the intersecting range (overlap) of these to be:

intersection:[42, 60]

Assumptions that axes datasets are monotonic are too strong. In a geographic setting, sampling can be done in any arbitrary path on Earth. For example, consider a spiral path!

For what you want, there are test methods for monotonicity in Comparisons plus crossings and findIndex* in DatasetUtils to do lookups.

Finding overlaps on the real line is not dataset function.

Aah, maybe I'm misinterpreting the role of AxesMetadata. I was imagining that AxesMetadata represented the independent axis for a set of dependent data values.

I'm comfortable with geographic data. Could you give me a quick example of how AxesMetadata would be used to represent the spiral path?

We could represent the spiral path as a Dataset of 2*n doubles that do not have to be monotonic. How would you apply AxesMetadata to this?

One instance I can think of would be to give the elapsed time at which each [x,y] tuplet was recorded, which would allow a graphs of x vs time or y vs time. This elapsed time is monotonic.

There can be more than one independent axis as in my example. AxesMetadata.getAxes(int) returns an array of datasets. So a scalar field like ground-level humidity (in a weather model) would depend on latitude and longitude. It could be measured by moving about an area quickly enough for the conditions to be consider static and so at one point in time. Thus H(x,y,t) has been measured at H(x_i,y_i,t) where {(x,y)} are a sequence of coordinates that describes a spiral.

Yes, got it, thanks @PeterC-DLS

That's a useful concept to know - I haven't encountered data like that.

Hi there team,
@PeterC-DLS explained that this processing is not a Dataset function. That's fine.

So, sorting out the overlapping period is the responsibility of the Dataset consumer, rather than Dataset itself.

I'll be implementing it in my Dataset consumer application. I'm quite happy to initially write it as a January example class (with Unit tests). This could act as a demonstration of the wider usage of Dataset, plus the unit tests in this fresh processing could prove valuable for regression testing. An alternate perspective could be that it pollutes the examples folder with something that's outside the core usage of January.

Any opinions?

IMHO I would welcome advanced examples that are self-testing. Going forward knowing and understanding how January consumers are using the code is always helpful.

Note at the moment the examples bundle is not run as part of the build. It should be and I will raise a separate ticket for that.

@jonahkichwacoders - yes, including examples in the build would have prevented #22 from remaining undetected.

Sounds fine to me: any advanced examples will help bring out more of the functionality in January!

Examples added above.