mir_eval.util.intervals_to_samples expected behavior

Question

mir_eval.util.intervals_to_samples expected behavior

Closed this issue 8 years ago · 8 comments

What's the intended behavior for mir_eval.util.intervals_to_samples?

In the following example:

import numpy as np
import mir_eval

intervals = np.array(
    [[0.2, 0.49],
    [0.8, 0.9]]
)
labels = ['a', 'b']
sample_times, sample_labels = mir_eval.util.intervals_to_samples(intervals, labels, offset=0, sample_size=0.1, fill_value=None)

I expected something close to:

sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', None, None, None, 'b'])

(i.e. time periods where there is no labeled interval get filled in with None)
but instead get:

sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', 'a', 'a', 'a', 'b'])

Am I missing something?

Answer 1 · 2017-04-12T18:54:09.000Z

I think there is an implicit assumption that the intervals are contiguous. The fill_value argument I believe refers to "out of range" meaning before intervals[0, 0] or after intervals[-1, 1]. Your expected behavior would still work fine for contiguous intervals and seems reasonable to me though. @bmcfee knows best here.

Answer 2 · 2017-04-12T19:23:26.000Z

@craffel is correct; the current implementation does not handle gaps.

Answer 3 · 2017-04-12T19:29:55.000Z

Was also interested in your input as to whether we should change the implementation so that it does handle gaps.

Answer 4 · 2017-04-12T19:32:09.000Z

@craffel @bmcfee makes sense. That assumption should probably be in the docs. (Happy to make a pr.) Especially because the description for the variable intervals says "An array of time intervals, as returned by mir_eval.io.load_intervals()" which could be non-contiguous.

Answer 5 · 2017-04-12T21:01:59.000Z

yeah, fair point. It would be easy to support this behavior; feel free to tag this for the next release, i can hack it on friday.

Answer 6 · 2017-04-14T19:26:13.000Z

I have a prototype working locally for this.

The change is entirely localized to interpolate_intervals, since that's where the heavy lifting is done.

One thing I noticed is that it only works if time_points is in sorted order. The only place where interpolate_intervals is called in mir_eval is from within intervals_to_samples, which only generates sorted time grids. However, the documentation does not stipulate ordering as a requirement.

Question for @craffel -- do we add ordering as a requirement to this function, or extend the function to support unordered time points? I'm having a hard time imagining a use case where that would be necessary, but one never knows.

Answer 7 · 2017-04-15T18:49:36.000Z

We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.

Answer 8 · 2017-04-17T13:07:21.000Z

We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.

Done. I also rewrote it once more using np.searchsorted, so it should be a smidgen faster.