mir_eval.util.intervals_to_samples expected behavior
Closed this issue · 8 comments
What's the intended behavior for mir_eval.util.intervals_to_samples
?
In the following example:
import numpy as np
import mir_eval
intervals = np.array(
[[0.2, 0.49],
[0.8, 0.9]]
)
labels = ['a', 'b']
sample_times, sample_labels = mir_eval.util.intervals_to_samples(intervals, labels, offset=0, sample_size=0.1, fill_value=None)
I expected something close to:
sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', None, None, None, 'b'])
(i.e. time periods where there is no labeled interval get filled in with None
)
but instead get:
sample_times = np.array([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8])
sample_labels = np.array([None, None, 'a', 'a', 'a', 'a', 'a', 'a', 'b'])
Am I missing something?
I think there is an implicit assumption that the intervals are contiguous. The fill_value
argument I believe refers to "out of range" meaning before intervals[0, 0]
or after intervals[-1, 1]
. Your expected behavior would still work fine for contiguous intervals and seems reasonable to me though. @bmcfee knows best here.
Was also interested in your input as to whether we should change the implementation so that it does handle gaps.
yeah, fair point. It would be easy to support this behavior; feel free to tag this for the next release, i can hack it on friday.
I have a prototype working locally for this.
The change is entirely localized to interpolate_intervals
, since that's where the heavy lifting is done.
One thing I noticed is that it only works if time_points
is in sorted order. The only place where interpolate_intervals
is called in mir_eval is from within intervals_to_samples
, which only generates sorted time grids. However, the documentation does not stipulate ordering as a requirement.
Question for @craffel -- do we add ordering as a requirement to this function, or extend the function to support unordered time points? I'm having a hard time imagining a use case where that would be necessary, but one never knows.
We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.
We should warn/throw an error when the time points are not ordered, and put a note in the docstring mentioning this. Thanks.
Done. I also rewrote it once more using np.searchsorted
, so it should be a smidgen faster.