mir-evaluation/mir_eval

mir_eval.util.intervals_to_samples expected behavior

Closed this issue · 8 comments

What's the intended behavior for mir_eval.util.intervals_to_samples?

In the following example:

import numpy as np
import mir_eval

intervals = np.array(
    [[0.2, 0.49],
    [0.8, 0.9]]
)
labels = ['a', 'b']
sample_times, sample_labels = mir_eval.util.intervals_to_samples(intervals, labels, offset=0, sample_size=0.1, fill_value=None)

I expected something close to:

sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', None, None, None, 'b'])

(i.e. time periods where there is no labeled interval get filled in with None)
but instead get:

sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', 'a', 'a', 'a', 'b'])

Am I missing something?

I think there is an implicit assumption that the intervals are contiguous. The fill_value argument I believe refers to "out of range" meaning before intervals[0, 0] or after intervals[-1, 1]. Your expected behavior would still work fine for contiguous intervals and seems reasonable to me though. @bmcfee knows best here.

@craffel is correct; the current implementation does not handle gaps.

Was also interested in your input as to whether we should change the implementation so that it does handle gaps.

@craffel @bmcfee makes sense. That assumption should probably be in the docs. (Happy to make a pr.) Especially because the description for the variable intervals says "An array of time intervals, as returned by mir_eval.io.load_intervals()" which could be non-contiguous.

yeah, fair point. It would be easy to support this behavior; feel free to tag this for the next release, i can hack it on friday.

I have a prototype working locally for this.

The change is entirely localized to interpolate_intervals, since that's where the heavy lifting is done.

One thing I noticed is that it only works if time_points is in sorted order. The only place where interpolate_intervals is called in mir_eval is from within intervals_to_samples, which only generates sorted time grids. However, the documentation does not stipulate ordering as a requirement.

Question for @craffel -- do we add ordering as a requirement to this function, or extend the function to support unordered time points? I'm having a hard time imagining a use case where that would be necessary, but one never knows.

We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.

We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.

Done. I also rewrote it once more using np.searchsorted, so it should be a smidgen faster.