vocalpy/vak

BUG: post-processing transform doesn't work with boundary labels

NickleDave opened this issue · 0 comments

Currently the output of transforms.frame_labels.functional.postprocess is the same as the input when we pass in boundary_labels to use to segment.

The root cause is that transforms.frame_labels.boundary_inds_from_boundary_labels does not check whether the input boundary_labels is a 1-d vector. And then it return nonzero(boundary_labels)[0]. Right now we pass in boundary_labels as a 2-d vector with the first dimension/axis of size 1, so this gives us back a vector of zeros, because zero is the correct index along the first axis for every occurrence of a 1--the 2nd axis is the vector of 1s and 0s.

The short term fix will be to call row_or_1d with the boundary_labels argument to convert it to a vector (or raise an error if it's some other unexpected shape). We already do this with the frame_labels argument at the top of postprocess and we should do the same with boundary_labels. We should also do this inside the boundary_inds_from_boundary_labels function itself. And we should add tests that the functions are now working as expected--something I expect that I failed to do when I first added this functionality.

Long term, I kind of don't like this "validate and convert if necessary" approach. I copied it from scikit-learn. It just feels weird to have the side effect of converting to a 1-d vector. It also feels weird that we validate/convert in these two functions but not in the function "between" them, segment_inds_list_from_boundary_labels, that actually produces the list of indexing vectors used by the postprocess function. I guess my logic is that we want to do the validation inside the top-level function, and we also need to make sure that the lowest level helper function works on its own, so we double-extra-validate there. If we do it inside the function between them, now we are really being redundant. I don't have any better ideas for how to do this right now, just making note.