dpwe/audfprint

Incorrect Time range.

Closed this issue · 1 comments

Hello @dpwe , thanks for your great work. But I have a question about the time range when I tried to print multiple outputs. I set -x 5 and expect to get 5 top-rated results, but I get this weird result.
image
and I print the result list.
image
You can see I got the same file and negative start time.

dpwe commented

What this means is that the matching region itself has a lot of self-similarity, so we get multiple matching reports describing overlapping time ranges. There should be some way to simplify this, but there's no effort to avoid it in the current code.

Imagine a reference item that includes a (near-identical) repeat. So if "A" is some stretch of audio, the reference item is AA. Now, if there's a query that includes AA, it will match the reference at offset 0. But it will also match by aligning its first A to the second A in the reference, so a match at offset T (where T is the duration of the the audio A). And it will also match at -T, corresponding to the second A in query aligning to the first A in the reference, etc.

For your example, it looks like T=56, and the query and reference have more than 8 repeats, i.e. AAAAAAAAA..., so that even at 8T (448 frame skew), we're getting matches.

If you don't want these overlapping matches to lead to multiple reports, your only option at the moment is to write your own code to go on top that calculates overlap between multiple match alignments between two items, and only reports the "biggest". But I haven't come up with a completely clean solution, since in general overlapping regions can form complex patterns (and I've seen such issues in practice).